Re: And Virgin in uk?
I'd say that the real issue here isn't whether Intel is good or bad. Remember that Intel generally makes pretty great stuff. It's a matter of how this bug is being handled.
I haven't been able to find a great deal of information on this bug other than the artificial stuff like "my modem is experiencing jitter" or your comment about flow tables. I'm going to assume that before using terms like flow tables, you've done some research on this and know what phase of the forwarding pipeline this is and whether population of the flow tables is the problem or not.
I'm curious about the forwarding process of the modem since whether in bridged or routed mode, I'm under the assumption at this point that Intel has implemented a hardware based packet processor that manages most packet buffering and forwarding in the hardware forwarding engine.
Now, the act of forwarding a packet should be deterministic at all times. However the decision making process of how and where to forward the packet can introduce difficulties. This engine almost certainly performs header parsing in hardware. This should not be an issue either since headers per service type should be consistent.
The length of the packet seems to be what is choking the system. Length matters if encountering packets classified as runt or otherwise exceeding the MTU in networking. What seems to matter here is how the device is handling what is likely padded frames. That means that when the hardware is processing a frame which needs to be transmitted with additional padding in order to not be classified as a runt frame (less than 64 bytes + clock recovery and CRC) there is apparently an issue.
So, by following the logic above, it seems to me that jitter and latency is introduced when padding is required when translating bridging from DOCSIS to 802.3. If DOCSIS transmits padding (not sure, I'm not that familiar with DOCSIS) then upon receiving the packet, the packet engine seems to strip the padding (which is healthy I imagine) and processes packet authenticity (CRC or equiv). Then when re-encapsulating as Ethernet, a new frame is constructed. When the frame meets all main requirements, the process is handled in hardware which is observed by the fact that larger packets don't appear to cause issues.
When a runt frame requiring padding is encountered, the modem will generate a protection fault within the CPU which is handled by the operating system. The operating system then signals the device driver for the hardware. The driver then either copies the frame from the buffer into system cache or performs bunches of IO operations on the memory in place in the buffer. The entire frame is likely parsed by CPU at this point, then is placed back into the buffer to forward it again and the driver signals the hardware via MMIO to continue.
The earlier bug we've seen before any patches has been that packets are simply dropped. I don't know if 100% of the runt frames are dropped and this accounts for 6% of the data or if 6% of the runt frames are dropped. This makes a big difference.
So my theory is that the hardware logic is completely missing a runt frame handler and is entirely dependent on software to process runt frames. This sounds crazy, but Cisco.. THE NETWORKING COMPANY has had a known (but quietly hidden) bug in their 6800IA hardware for over 2 years unpatched that drops runt frames when they retag VLANs and they're hoping noone notices and complains.
Given the forwarding engine that is likely provided in the PUMA 6 (speculating) is designed to work like a normal forwarding engine, that means 99.99% of the forwarding work will be done in hardware. If there is an exception encountered meaning that the forwarding table (I assume this is what you mean by flow table) is not populated or there is a packet exception such as a runt frame requiring padding, the CPU will need to process the packet.
It is extremely common that in established conversations, the flow table should not need to be altered. Since you are in bridge mode there are probably 2-3 known flows on the DOCSIS side (being the routers processing at the CMTS) and there is probably at most 1 flow on the Ethernet side which is your network router. However in Layer-3 there could be a great deal more involved when encountering NAT.
That said, since the device likely can handle NAT, it probably has some amazing processing capabilities for handling exceptions with the NAT tables.
But runt frame processing is not handled quickly it seems. This could be that NAT doesn't actually require reading and parsing a full frame, then generating a whole new frame to process. Instead, it probably simply requires using a hardware optimized mechanism to push a new NAT entry to the table and then translation is handled in hardware.
So, then comes latency and jitter. If the packet has to be processed in software and the software itself is not designed for packet processing (meaning plain old Linux, wind river, whatever) then there would be a non-deterministic latency when processing these packets as operating systems can often use between 1ms and 150ms just to respond to an interrupt. This is not an issue for the occasional unknown flow. Chances are, the hardware is using an alternate buffer to forward with during this time. But if there are a lot of frames queued for forwarding, the buffers could be full, and block the pipeline which the unknown frame is being processed... at which time 150ms can be deadly.
So, the next thing that comes up is that there were earlier articles on this topic I believe which blames CPU speed throttling for the problem. This is common. Since the CPU in question spends most of its time asleep as it only needs to handle management and exceptions, it can be REALLY slow most of the time. When a new exception comes in, it will need to throttle the CPU up quickly. This adds more delays... maybe another 50ms ... who knows, I can't find the programmers guide for the chip.
So, now we're seeing lots of delays.
One option is that the ISP simply block runt frames which will kill any games using very small frames. Then beg the game guys to intentionally pad their packets. Of course, chat programs that transmit every character as they're typed will fail as well.
Another option is to optimize the OS kernel for packet processing runt frames... if they can be processed at all. There's a chance the packet forwarding microcode doesn't have a proper mechanism for this. It may demand each packet is handled independently. If this is the case, then without replacing the chip, there may be no answer. Of course, recoding the OS, writing a split core kernel which would allow one core to run the management OS and the other core to run a packet processor can improve performance and provide deterministic forwarding, it would still have high latency. But at least it would be reliable.
Finally, the real solution, recall the products. The issue with this is that the cost of fixing a device that is this cheap with such a small margin is more expensive than just making a new one. That said, if Intel has to sponsor a recall of every single device shipped with these chips, it could mean billions lost.
So the best option may be, help the vendors make runt frame friendly devices. Then if a customer complains, send them a replacement free of charge. Then pay whatever class action suit comes up for $5-$50 million and be done with it. It might even be cheaper just to pay the class action and make you buy your own replacement.
I think Intel unfortunately is handling this the best they can. Bugs happen. And there has never been a cable modem chipset that didn't suffer one problem or another.
I think you'll find that it shouldn't be long before your service provider is in a position offer a new modem with a newer chip that doesn't have the problem. I'd imagine the delay now is quality control.