Wish you're here (c) ...
iSCSI is Application layer protocol and its job is quite simple: wrap iSCSI PDU over SCSI data (You can read this as “append SCSI payload to the end of re-used iSCSI header and re-calculate the checksums”) and push resulting thing down the network stack to TCP. It has no clue about what happens next. So everything you say about SINGLE LINK is utter nonsense. Cutting user data into network frames and touching actual fabric happens much lower at Packet and Data Link layers (NDIS level if somebody is interested in Windows terminology) and is completely beyond the scope of iSCSI protocol view. It’s up to Data Link layer how to transmit the payload it got from the layers above. Dumb implementation you describe here can indeed send it using single Tx / Rx pair wire set. Wise implementation can use NIC teaming splitting for example 8 KB iSCSI PDU into two 4 KB Jumbo Ethernet frames and firing them at the same time. Look one “virtual” TCP connection (…and there can be many of them inside single iSCSI session, did ya know this?) can belong to many “physical” copper wires or optic fibers. It is system administrator who controls this not iSCSI protocol itself. With AoE things are different… Because AoE operates at Data Link layer itself there’s nobody except it who can do dirty work of packet management for AoE. AoE initiator chunks user buffer into Ethernet frames and fires them with as many NICs in parallel as it wants that’s right. But basically iSCSI and AoE behave in the same way. Except iSCSI follows OSI network layer design doing sequential data transformation in every used or bypassed layer and AoE is entirely monolithic keeping birds and flies in the same basket and doing all the work at the same place. AoE brakes OSI network layer model as a result completely. CoRAID represents their flaw-by-design as a feature. And I just hate people doing this.
Your understanding of the way MPIO works is at least sparse. Round-Robin for iSCSI works in the very different way. Every next request is executed WITHOUT waiting for the previous one to complete, and of course using different physical route. Unless application really needs serialization (Does atomic write to transaction database bitmap or whatever reason you could imagine) and it happens quite rare. What you describe is closer to Active-Stand-By MPIO policy. I don’t mind VMware working in exactly the way you tell (Sorry I’m too lazy to check it myself and wasted engineers time costs money so let’s assume you say truth) but in any case it’s entirely VMware issue and has nothing to do with iSCSI design! You can verify what I say pretty easy: connect with MS iSCSI initiator to that target supporting multiple sessions. Configure MPIO in RR using many routes. Enable “Multiple Sessions” inside Advanced Options tab. Play with the session value changing it from default 1 to the maximum 32. And run I/O Meter or your favorite benchmark tool to see what should follow. If your assertion about “fire-and-wait” instead of “fire-and-forget” would not be FALSE we should see ZERO difference. But we’ll notice performance counters going up until we’ll see the whole thing touching either wire speed or target device bus speed or target device medium speed. In any case it would INCREASE. And you tell us it should not. Shame on you!
iSCSI usually uses modified TCP settings for both Linux and Windows (I guess any other OS including Plan9 you supposed to love). For example Nagle algorithm is disabled, TCP_NODELAY enabled, maximum connections limit altered, double ACK turned OFF and so on and on and on. AoE re-transmissions values are hardcoded and TCP retransmission time is calculated but could be overdriven with anything you want just fine. So for both AoE and iSCSI we end up with the thing called “kernel timer resolution” as a maximum retransmission delay.
TCP stack overhead. How lovely! Are you serious? Run some flood generating network application, launch Task Manager, enable “kernel time” report and see how much time would your OS spent at a raised interrupt level (Task Manager will report it in red color on the chart). My Quad Core i7 @2.66Hz spends around 5%. If you have weaker CPU on both sides of your storage cluster I’m sorry. My nearly 2 years old desktop won this dogfight. See I would not switch gas stations only because one of them has 0.01 cent per gallon cheaper gas. I don’t drive Boeing 747 and I don’t fly to Moon with my car so I don’t care about that miserable number. The same is about TCP stack and CPU time. If we’d live in XX age and it would be 50% more CPU time and $1,000 difference for CPU swap I would consider AoE as an option. “We’re not in Kansas anymore!” © … It’s XXI and I don’t care about 0,001% CPU time iSCSI uses for TCP/IP and AoE is saving to me. Oh, thank you VERY much! © … It’s still NOTHING! Virtual benefit CORAID is using to BS their customers. Also have you ever heard about TCP offload engines and at least partial iSCSI accelerators? Google a bit if you’re not banned by Google yet to confirm one simple fact that even desktop positioned NIC silicon now has listed stuff as a standard feature. As I’ve already told it is XXI now and what was funny 20 years ago sounds sick today.
CORAID DOES NOT HAVE ANY HBAS. Please call the things with their real names. If you have a sticker “HBA” on an ordinary 1 GbE or 10 GbE NIC it’s still an ordinary NIC. Like if I’ll manage to take on my high school sweater with a VIRGIN written on the chest I’ll have extremely hard times telling people where did I took my daughter from. HBAs assume you have not only NIC PHY but also processor running firmware to offload host machine CPU and dedicated memory to cache some DMA transactions moving data thru PCIe bus. Totally different design compared to a NIC combined with any software initiator driver. Everything runs on a host CPU. Bother yourself to download CORAID AoE drivers for Linux. “It’s kosher. As Christmas!” © …
You would not be surprised to see what? Where did you spend your last 10 years? Honestly? I guess county jail with no on-line Internet connection and very limited communication abilities. Playing nude chess with warden on the weekends and talking about IT StarWind was the first company made AoE driver for Windows. Long before open source community dirt crafted buggy WinAoe and forked away from it even buggier WinVBlock. We’ve been playing with AoE for quite a time and it was actually lack of official Windows support stopping us from putting this beast into production a couple of years ago. We only needed raw capacity (A LOT OF CAPACITY) so improper Hyper-V support was not an issue.
It’s a very questionable habit to make any kind of assumptions what could happen or could happen not with anybody outside your own company. But as it’s entirely your HO and I’m not arguing here.