Pedantic - AoE is ATA over Ethernet.
AoE is ATA over Ethernet, not Application over Ethernet.
Coraid is that niche ATA over Ethernet (AOE) vendor that's always about to break out of its reservation but never has. That could be about to change if things we're hearing from resellers are true. We have heard about a European financial institution that compared a Coraid storage array against a Fibre Channel SAN and found …
AoE is ATA over Ethernet, not Application over Ethernet.
Would be a lot less confusing, thanks.
VDi problem is usually not with network speed as such and even not with IO speed on the host server, it is with latency on _LARGE_ frames in the client. If you are going to invest into VDI you may as well invest into fixing this than throwing crazy money at AOE or FC storage. In fact any storage investment will be pretty much wasted if you do not fix your client infra first.
In the client, if you generate an IRQ per frame or even per few frames (bog standard IRQ mitigation) you still get the chance of task switch in the middle and the overall latency for a "draw this" operation on the client increases by as much as an order of magnitude.
If you offload the reassembly in the IP stack to the card and there is one IRQ per whole TCP segment with virtually the picture changes drastically. Even a lame duck client based on a 1GHz Pentium3 from around 2002 can fly and deliver performance on par with sitting on a physical desktop once you put the proper Ethernet into it. One that shows GSO, GRO, TSO on ethtool and can actually do them.
That is the first thing to do, long before any Coraids taking off...
Taking to the task vendors that continue to ship thin clients with Ethernets which do not have this also a jolly good idea.
Ivan - while I accept your point on clients you miss the point in that storage is a massive part of whether a VDI project works or fails - and they all too often fail in a big way. Just having a better performing connection for thin client access is pointless if your 1000 desktop VDI project is either:
nailed on performance because the amount of required storage IOPS were underestimated, or
considered a poor ROI because to get that peformance you have to spend a million quid on storage!
The whole point is that you don't throw crazy money at AoE, you get a transmission protocol that can handle the traffic of a massive VDI platform at equal or better performance than fibre channel at as little as 20% of the cost. Coraid can provide the mass storage on this platform and to some extent the peformance for some solutions with SSDs, and for real far out performance WhipTail can consolidate to such an extent that you save even more more money per seat. FOUR racks to 2U with the same IOPS, what part of that do you not find amazing?
However until AoE becomes accepted across multiple vendors a full and affordable end to end Enterprise VDI project is still just on the edge of reality - and if nobody talks about AoE then it will never become one.
SEnt to me:-
Developing a storage company one into a profitable and dependable one
is a very long haul. Personally, I continue to have my doubts about
FCoE as a long term protocol for any purpose other than tunneling
"real" FC networks. The reason I have these doubts is that Data Center
Bridge will bring all the benefits that it has for FCoeE to iSCSI when
it is fully deployed, and generally iSCSI is simpler to work with than
the alternatives. If that's all true of a widely-backed standard, than
it's doubly true for a niche solution like AoE.
When CORAID says that AoE is 2-4 times faster than iSCSI, are they
telling me that it goes 2-4 Gpbs on a 1Gpbs network or 20-40 Gpbs on a
10GE network? I ask because there are plenty of iSCSI systems that go
line rate today. Now I realize that this is a loaded question. CORAID
is almost certainly referring to latency. However, corporate
applications that care about sub-millisecond latencies are
increasingly rare. The ones that do care that much aren't going to be
doing AoE anytime soon, however.
My prediction for CORAID: they'll be around for a long while as an
economy player, for buyers who either have no budget, or don't know
they ought to have a budget, for more complete solutions.
As an aside, storage buyers are conservative for a reason.
I have pretty much the same question. We have StarWind based iSCSI SAN connected with 40 Gbps Mellanox network infrastructure to our non-linear video editing production system. StarWind does wire speed. If tomorrow we'll go crazy and replace it with CORAID box should be expect 80 or 160 Gbps over the same copper wire? Probably not. So... WHAT. DOES. 2-4 TIMES. FASTER. EXACTLY. MEANS. And I have another question sitting here at the back of my head. How people are expecting me to buy from them if then don't have Windows drivers? So instead of getting the software from CORAID directly or from OS vendor itself (like it happens with iSCSI and MS iSCSI initiator or FC and any FC card brand) i have to go getting driver from... StarWind Software! Who has AoE initiator for free (no idea why they are going this to CORAID but whatever...). Or go compiling open source driver with no chance to install it on 64-bit machine as it's not signed so rejected by Vista and Windows 7 and ZERO support even in theory. OK, I have an option to buy AoE HBA. Great! But it's only rebranded and re-packaged Intel ancient 1 GbE NIC! I feel like they are cheating me... And it's not 2003 any more. We need 10 GbE and 40 GbE and 100 GbE is not very far as well. So customer service and respect stop immediately after the front door opened. Well nobody was fired for buying an IBM but I know at least one guy who was fired for considering using CORAID box instead of FC or iSCSI appliance (no names here as it really does not matter).
Good luck Mr. Gorsky! (c) ... I mean CORAID.
OK, let's take your issues:
iSCSI is a connection oriented protocol that requires a session to be established across a SINGLE LINK and the data is then transmitted across it. iSCSI can employ other links in a round robin fashion but if you take a look at the connections in VMware as an example you will see on a 4 port iSCSI system 1 is active and 3 are in standby. The others are only used if a session is broken and then the next available one is used. Some other implementations will perhaps constantly cycle across each link but ONLY ONE IS USED AT A TIME - that is iSCSI (and to some extent FC).
AoE splits the data stream and sends it across all available ports in parallel. It is a connectionless protocol that does not require the packets to arrive in order, and if a packet doesn't arrive then it uses a datagram retransmit (microseconds) as opposed to a TCP retransmit (up to 200ms).
Therefore comparing a 4 port iSCSI to a 4 port AoE with large amounts random traffic as you would see in a virtualisation environment the throughput is as much as 2 or 4 times greater than iSCSI.
True if you only had 1 AoE and 1 iSCSI port then there wouldn't be a great difference, but the lack of TCP/IP stack in AoE would still make the connection more efficient with less need to offload the data transfer process with extra on NIC CPU power.
Coraid supports 10Gbit, using HBAs on VMware, Windows and OpenSolaris, and native NICs on Linux and XenServer. 40Gbit on it's way (been around a while in the form of 4x 10Gbit ports running in parallel), as will be 100Gbit in time, you don't build an Ethernet SAN and then not support the development of Ethernet!
AoE drivers will be getting a lot more common place, and I wouldn't be surprised to see one from StarWind soon. We know of somebody who took a couple of weeks to get one to work on a CCTV camera, so the camera talks AoE Direct to the storage with no configuration (as there are no IPs) required. Let me tell you in their tests it wiped out iSCSI as a transport medium with HD feeds needing all the throughput they can get these days.
Nobody ever got fired for considering anything, botching up an install and losing the business money yes, but just thinking about it - that is just silly.
It always amazes me what vitriol new ideas inspire amongst the die hard users of an existing system, and this usually gets to its worst just before that new idea really takes off - so keep it coming!
iSCSI is Application layer protocol and its job is quite simple: wrap iSCSI PDU over SCSI data (You can read this as “append SCSI payload to the end of re-used iSCSI header and re-calculate the checksums”) and push resulting thing down the network stack to TCP. It has no clue about what happens next. So everything you say about SINGLE LINK is utter nonsense. Cutting user data into network frames and touching actual fabric happens much lower at Packet and Data Link layers (NDIS level if somebody is interested in Windows terminology) and is completely beyond the scope of iSCSI protocol view. It’s up to Data Link layer how to transmit the payload it got from the layers above. Dumb implementation you describe here can indeed send it using single Tx / Rx pair wire set. Wise implementation can use NIC teaming splitting for example 8 KB iSCSI PDU into two 4 KB Jumbo Ethernet frames and firing them at the same time. Look one “virtual” TCP connection (…and there can be many of them inside single iSCSI session, did ya know this?) can belong to many “physical” copper wires or optic fibers. It is system administrator who controls this not iSCSI protocol itself. With AoE things are different… Because AoE operates at Data Link layer itself there’s nobody except it who can do dirty work of packet management for AoE. AoE initiator chunks user buffer into Ethernet frames and fires them with as many NICs in parallel as it wants that’s right. But basically iSCSI and AoE behave in the same way. Except iSCSI follows OSI network layer design doing sequential data transformation in every used or bypassed layer and AoE is entirely monolithic keeping birds and flies in the same basket and doing all the work at the same place. AoE brakes OSI network layer model as a result completely. CoRAID represents their flaw-by-design as a feature. And I just hate people doing this.
Your understanding of the way MPIO works is at least sparse. Round-Robin for iSCSI works in the very different way. Every next request is executed WITHOUT waiting for the previous one to complete, and of course using different physical route. Unless application really needs serialization (Does atomic write to transaction database bitmap or whatever reason you could imagine) and it happens quite rare. What you describe is closer to Active-Stand-By MPIO policy. I don’t mind VMware working in exactly the way you tell (Sorry I’m too lazy to check it myself and wasted engineers time costs money so let’s assume you say truth) but in any case it’s entirely VMware issue and has nothing to do with iSCSI design! You can verify what I say pretty easy: connect with MS iSCSI initiator to that target supporting multiple sessions. Configure MPIO in RR using many routes. Enable “Multiple Sessions” inside Advanced Options tab. Play with the session value changing it from default 1 to the maximum 32. And run I/O Meter or your favorite benchmark tool to see what should follow. If your assertion about “fire-and-wait” instead of “fire-and-forget” would not be FALSE we should see ZERO difference. But we’ll notice performance counters going up until we’ll see the whole thing touching either wire speed or target device bus speed or target device medium speed. In any case it would INCREASE. And you tell us it should not. Shame on you!
iSCSI usually uses modified TCP settings for both Linux and Windows (I guess any other OS including Plan9 you supposed to love). For example Nagle algorithm is disabled, TCP_NODELAY enabled, maximum connections limit altered, double ACK turned OFF and so on and on and on. AoE re-transmissions values are hardcoded and TCP retransmission time is calculated but could be overdriven with anything you want just fine. So for both AoE and iSCSI we end up with the thing called “kernel timer resolution” as a maximum retransmission delay.
TCP stack overhead. How lovely! Are you serious? Run some flood generating network application, launch Task Manager, enable “kernel time” report and see how much time would your OS spent at a raised interrupt level (Task Manager will report it in red color on the chart). My Quad Core i7 @2.66Hz spends around 5%. If you have weaker CPU on both sides of your storage cluster I’m sorry. My nearly 2 years old desktop won this dogfight. See I would not switch gas stations only because one of them has 0.01 cent per gallon cheaper gas. I don’t drive Boeing 747 and I don’t fly to Moon with my car so I don’t care about that miserable number. The same is about TCP stack and CPU time. If we’d live in XX age and it would be 50% more CPU time and $1,000 difference for CPU swap I would consider AoE as an option. “We’re not in Kansas anymore!” © … It’s XXI and I don’t care about 0,001% CPU time iSCSI uses for TCP/IP and AoE is saving to me. Oh, thank you VERY much! © … It’s still NOTHING! Virtual benefit CORAID is using to BS their customers. Also have you ever heard about TCP offload engines and at least partial iSCSI accelerators? Google a bit if you’re not banned by Google yet to confirm one simple fact that even desktop positioned NIC silicon now has listed stuff as a standard feature. As I’ve already told it is XXI now and what was funny 20 years ago sounds sick today.
CORAID DOES NOT HAVE ANY HBAS. Please call the things with their real names. If you have a sticker “HBA” on an ordinary 1 GbE or 10 GbE NIC it’s still an ordinary NIC. Like if I’ll manage to take on my high school sweater with a VIRGIN written on the chest I’ll have extremely hard times telling people where did I took my daughter from. HBAs assume you have not only NIC PHY but also processor running firmware to offload host machine CPU and dedicated memory to cache some DMA transactions moving data thru PCIe bus. Totally different design compared to a NIC combined with any software initiator driver. Everything runs on a host CPU. Bother yourself to download CORAID AoE drivers for Linux. “It’s kosher. As Christmas!” © …
You would not be surprised to see what? Where did you spend your last 10 years? Honestly? I guess county jail with no on-line Internet connection and very limited communication abilities. Playing nude chess with warden on the weekends and talking about IT StarWind was the first company made AoE driver for Windows. Long before open source community dirt crafted buggy WinAoe and forked away from it even buggier WinVBlock. We’ve been playing with AoE for quite a time and it was actually lack of official Windows support stopping us from putting this beast into production a couple of years ago. We only needed raw capacity (A LOT OF CAPACITY) so improper Hyper-V support was not an issue.
It’s a very questionable habit to make any kind of assumptions what could happen or could happen not with anybody outside your own company. But as it’s entirely your HO and I’m not arguing here.
So this broken OSI model type protocol therefore can't possibly work in the real world then obviously? The 1300+ customers have just been hoodwinked into thinking their storage works phenomenally well...
So beating Isilon in it's own niche space of media delivery was a fluke? A combination of Coraid and Nexenta hammering NetApp Filers is just a figment of the testers imagination?
Measured throughputs of over 1800 Mbytes/sec and 4500 IOPS from a single shelf of disks couldn't have happened and Coraid bunged ESG Labs to come up with those test results?
How VMware does MPIO is actually pretty relevent considering it is the biggest virtualisation platform, so when I build a iSCSI solution alongside an AoE one and not only is it way quicker to configure but throughput tests show AoE handles random virtualisation workloads much more efficiently and on the performance graphs I see more throughput.
Obviously I must have been dropping too many magic mushrooms lately and I'm seeing things.
So on that basis why don't you take issue with EMC, NetApp, Isilon and VMware and see how much traction that gets you? Coraid makes an easier target but it doesn't make it wrong.
I'll admit not knowing StarWind had an AoE driver as I don't know the product inside out, although I respect it for the reviews it has had - looks a great product.
Coraid's biggest sin was growing organically in the Linux space, because it has lost a lot of years when it could have been providing the things you found missing when you tried out AoE. That delay will hurt the progression of AoE in the face of iSCSI evangelists. My future doesn't rely on the success or failure of AoE so I'm happy to sit it out and see what develops.
Gosh! 1300 customes! That's a number... Know what? There are probably 100,000 IET sporadic uses poking around. Does it mean IET is a great product? Probably not. It only means these poor little souls don't have money to buy them a real deal. Vitually the same could be told about CORAID. They used to sell their stuff dirt cheap per terrabyte. So people were buying it. Why not? But again does it mean AoE is superior to FC or iSCSI? Absolutely not!
Look... I don't buy Isilion or VMware. So I don't really care who had won performance beauty contest CORAID or Isilion, who lost when and why. Back to VMware... If they did not manage to have proper MPIO support it still means only one obvious thing - they don't have proper MPIO. There are others who have it done in the right way. Competition. Free market. Ever heard about these two little tricky things?
CORAID's biggest sin... It's totally different from what you say. They had read too many "Differentiate or die" books. And followed "Failure is the mother of success" (c) Mao Tse-Dung too strictly.
Sent to me:-
Your comments are right on. People want the scale, performance and availability of a SAN but they don’t want the overhead of managing it.
As we deliver this to the market we have discovered the "SAN friction" issue is immense and by relieving customers of it we are enabling SAN's to be deployed more broadly.
Thanks for the thoughtful article.
Bob Fernander CEO Pivot3 Inc.