Uh-Koh! Apple-Samsung judge to oversee buggy Intel modem chip fight

Thursday 8th June 2017 22:44 GMT Blotto

And Virgin in uk?

Will we see similar action against Virgin in the uk?

2 0 Reply
1. Friday 9th June 2017 06:42 GMT Voland's right hand
  
  Re: And Virgin in uk?
  
  I am still holding up on upgrading until they stop pushing the Intel crap.
  
  While my modem is bridge mode, AFAIK the dumb-ass firmware used in most Puma 6 still fills the flow tables despite being asked to just move packets from one side to another without trying to be intelligent.
  
  0 1 Reply
  1. Friday 9th June 2017 08:49 GMT CheesyTheClown
    
    Re: And Virgin in uk?
    
    I'd say that the real issue here isn't whether Intel is good or bad. Remember that Intel generally makes pretty great stuff. It's a matter of how this bug is being handled.
    
    I haven't been able to find a great deal of information on this bug other than the artificial stuff like "my modem is experiencing jitter" or your comment about flow tables. I'm going to assume that before using terms like flow tables, you've done some research on this and know what phase of the forwarding pipeline this is and whether population of the flow tables is the problem or not.
    
    I'm curious about the forwarding process of the modem since whether in bridged or routed mode, I'm under the assumption at this point that Intel has implemented a hardware based packet processor that manages most packet buffering and forwarding in the hardware forwarding engine.
    
    Now, the act of forwarding a packet should be deterministic at all times. However the decision making process of how and where to forward the packet can introduce difficulties. This engine almost certainly performs header parsing in hardware. This should not be an issue either since headers per service type should be consistent.
    
    The length of the packet seems to be what is choking the system. Length matters if encountering packets classified as runt or otherwise exceeding the MTU in networking. What seems to matter here is how the device is handling what is likely padded frames. That means that when the hardware is processing a frame which needs to be transmitted with additional padding in order to not be classified as a runt frame (less than 64 bytes + clock recovery and CRC) there is apparently an issue.
    
    So, by following the logic above, it seems to me that jitter and latency is introduced when padding is required when translating bridging from DOCSIS to 802.3. If DOCSIS transmits padding (not sure, I'm not that familiar with DOCSIS) then upon receiving the packet, the packet engine seems to strip the padding (which is healthy I imagine) and processes packet authenticity (CRC or equiv). Then when re-encapsulating as Ethernet, a new frame is constructed. When the frame meets all main requirements, the process is handled in hardware which is observed by the fact that larger packets don't appear to cause issues.
    
    When a runt frame requiring padding is encountered, the modem will generate a protection fault within the CPU which is handled by the operating system. The operating system then signals the device driver for the hardware. The driver then either copies the frame from the buffer into system cache or performs bunches of IO operations on the memory in place in the buffer. The entire frame is likely parsed by CPU at this point, then is placed back into the buffer to forward it again and the driver signals the hardware via MMIO to continue.
    
    The earlier bug we've seen before any patches has been that packets are simply dropped. I don't know if 100% of the runt frames are dropped and this accounts for 6% of the data or if 6% of the runt frames are dropped. This makes a big difference.
    
    So my theory is that the hardware logic is completely missing a runt frame handler and is entirely dependent on software to process runt frames. This sounds crazy, but Cisco.. THE NETWORKING COMPANY has had a known (but quietly hidden) bug in their 6800IA hardware for over 2 years unpatched that drops runt frames when they retag VLANs and they're hoping noone notices and complains.
    
    Given the forwarding engine that is likely provided in the PUMA 6 (speculating) is designed to work like a normal forwarding engine, that means 99.99% of the forwarding work will be done in hardware. If there is an exception encountered meaning that the forwarding table (I assume this is what you mean by flow table) is not populated or there is a packet exception such as a runt frame requiring padding, the CPU will need to process the packet.
    
    It is extremely common that in established conversations, the flow table should not need to be altered. Since you are in bridge mode there are probably 2-3 known flows on the DOCSIS side (being the routers processing at the CMTS) and there is probably at most 1 flow on the Ethernet side which is your network router. However in Layer-3 there could be a great deal more involved when encountering NAT.
    
    That said, since the device likely can handle NAT, it probably has some amazing processing capabilities for handling exceptions with the NAT tables.
    
    But runt frame processing is not handled quickly it seems. This could be that NAT doesn't actually require reading and parsing a full frame, then generating a whole new frame to process. Instead, it probably simply requires using a hardware optimized mechanism to push a new NAT entry to the table and then translation is handled in hardware.
    
    So, then comes latency and jitter. If the packet has to be processed in software and the software itself is not designed for packet processing (meaning plain old Linux, wind river, whatever) then there would be a non-deterministic latency when processing these packets as operating systems can often use between 1ms and 150ms just to respond to an interrupt. This is not an issue for the occasional unknown flow. Chances are, the hardware is using an alternate buffer to forward with during this time. But if there are a lot of frames queued for forwarding, the buffers could be full, and block the pipeline which the unknown frame is being processed... at which time 150ms can be deadly.
    
    So, the next thing that comes up is that there were earlier articles on this topic I believe which blames CPU speed throttling for the problem. This is common. Since the CPU in question spends most of its time asleep as it only needs to handle management and exceptions, it can be REALLY slow most of the time. When a new exception comes in, it will need to throttle the CPU up quickly. This adds more delays... maybe another 50ms ... who knows, I can't find the programmers guide for the chip.
    
    So, now we're seeing lots of delays.
    
    One option is that the ISP simply block runt frames which will kill any games using very small frames. Then beg the game guys to intentionally pad their packets. Of course, chat programs that transmit every character as they're typed will fail as well.
    
    Another option is to optimize the OS kernel for packet processing runt frames... if they can be processed at all. There's a chance the packet forwarding microcode doesn't have a proper mechanism for this. It may demand each packet is handled independently. If this is the case, then without replacing the chip, there may be no answer. Of course, recoding the OS, writing a split core kernel which would allow one core to run the management OS and the other core to run a packet processor can improve performance and provide deterministic forwarding, it would still have high latency. But at least it would be reliable.
    
    Finally, the real solution, recall the products. The issue with this is that the cost of fixing a device that is this cheap with such a small margin is more expensive than just making a new one. That said, if Intel has to sponsor a recall of every single device shipped with these chips, it could mean billions lost.
    
    So the best option may be, help the vendors make runt frame friendly devices. Then if a customer complains, send them a replacement free of charge. Then pay whatever class action suit comes up for $5-$50 million and be done with it. It might even be cheaper just to pay the class action and make you buy your own replacement.
    
    I think Intel unfortunately is handling this the best they can. Bugs happen. And there has never been a cable modem chipset that didn't suffer one problem or another.
    
    I think you'll find that it shouldn't be long before your service provider is in a position offer a new modem with a newer chip that doesn't have the problem. I'd imagine the delay now is quality control.
    
    3 1 Reply
    1. Sunday 11th June 2017 15:46 GMT Voland's right hand
      
      Re: And Virgin in uk?
      
      I haven't been able to find a great deal of information on this bug other than the artificial stuff like "my modem is experiencing jitter"
      
      The time it took you to write your post would have been more than enough to find a test case - just start 1024+ flows (tcp connections) and it barfs
      
      0 1 Reply
      1. Friday 16th June 2017 06:22 GMT CheesyTheClown
        
        Re: And Virgin in uk?
        
        That's under the assumption that I had access to such hardware. You are also under the false impression that what you just posted would provide meaningful information. I can see the results of that on some of the links I've encountered and it simply didn't provide much information.
        
        Let's run with this though.
        
        First of all, I'd imagine that if Intel has not released a patch with this problem, it would require alterations to the ASIC in order to correct the issue.
        
        I could probably with some effort borrow a CMTS from a local cable company, I see that I can find a relatively old Cisco 7200 based CMTS for about $2000 from eBay or maybe piece one together from a chassis and a line card for a few hundred dollars. The problem with this is that I wouldn't be able to get DOCSIS 3.0 support operational which may be required.
        
        I see that Puma 6 modems don't cost much either.
        
        So, let's assume I could build a test rig for about $1000 (which I really wouldn't spend unless I had a business case).
        
        I would need to figure out how to get root OS access to the Puma modem which likely is not difficult, though if the modem is running anything other than Linux, it may have just one of those stupid text based management programs. So I would need to connect JTAG cables to run in-circuit-emulation based debuggers. For Intel chips this isn't particularly difficult as they are extremely well known and thoroughly documented.
        
        A much less expensive alternative is to get a boot image for the device and open it in something like IDE pro with a decompiler plugin. This could require much effort to work with since I'd have to guess my way through the file system and operating system code. And if the operating system image is compiled monolithic (instead of using kernel modules) which is common on embedded systems like this, I would have little or no hope of reverse engineering the applicable drivers.
        
        Even if I somehow managed to reverse engineer the drivers (not really that difficult from kernel modules) then I would only have the control APIs to the ASIC, it wouldn't give me insight to the ASIC itself.
        
        As the problem does not appear to be able to be fixed by software, even if I managed to reverse engineer microcode pushed to the chip (disassemble to VHDL or similar), it would likely not cover the areas of the chip which are plagued.
        
        If I had the VHDL code to the chip, it may be difficult to work with. Generally, even without comments, it requires good engineering documentation with the block diagrams of each core... but with this, I more than likely could accurately diagnose the problem and come to similar conclusion that Intel more than likely has which is that there's a hardware limitation somewhere that can't be fixed without replacement.
        
        So we're back to speculation... and more than likely meaningless speculation.
        
        So I stand by my comment that Intel can replace the modems which people complain about with newer models. And for everyone else, suggest that cable companies implement IPS filters to protect their users from attacks.
        
        0 0 Reply
Thursday 8th June 2017 23:33 GMT Chris Stephens

Im so happy its this judge :) We have put a lot of effort into this over the last 8 months at DSLReports. Ive worked hard for this. Whats scary is the DoS https://www.theregister.co.uk/2017/04/27/intel_puma6_chipset_trivial_to_dos/ its completely unpatched and the exploit code is so easy grandma could do it. https://github.com/nallar/Puma6Fail/releases and 2 months after Intel Product security acknolodged the issue we still dont have a CVE despite its agreed to HIGH rating. No alerts from anyone. Ive never seen a public 0-day that effects millions with public trival code published that has no alerts after 2 months. Also the ISP/MSO hardware cant mitigate / block the exploit because of its streaming nature. So once a attack is started the modem stays offline until the IP is changed.

5 0 Reply
1. Friday 9th June 2017 01:32 GMT Mark Exclamation
  
  I hope you don't actually write any of the reports at DSLReports!
  
  2 0 Reply
Friday 9th June 2017 01:12 GMT Anonymous Coward

Horrible sexist language!

Does she have a sister? We need more like her!

Okay, okay, a brother would do also... if he is as smart and honest as she is.

1 0 Reply
Friday 9th June 2017 08:03 GMT Mage

Intel?

Presumably Arris is unhappy with Intel. Given the number of issues with Intel products, they are looking tarnished.

0 0 Reply
Sunday 11th June 2017 23:57 GMT Ian Joyner

The Register is smoking crack

>>She famously told Apple's lawyers they were "smoking crack" when they tried to call 75 pages of witnesses in a 2012 battle with Samsung, for instance.<<

You must be smoking crack to make the link and use it in a headline that has nothing to do with Apple and Samsung at all. Just another dopey Register slight against Apple.

0 2 Reply
1. Tuesday 10th October 2017 22:28 GMT Anonymous Coward
  
  Re: The Register is smoking crack
  
  "You must be smoking crack to make the link and use it in a headline that has nothing to do with Apple and Samsung at all. "
  
  I think the point being made is that she doesn't take bs from anyone, and that we can expect to see the usual delaying and avoidance tactics dealt with rapidly and effectively...
  
  0 0 Reply