Data centre I/O needs simplifying, everyone is agreed. There are reckoned to be two candidates; datacentre Ethernet (DCE) which Cisco and Brocade support, and InfiniBand, pushed by Voltaire and Mellanox. Both DCE and InfiniBand virtualize I/O by having other network protocols run inside their fat pipes. But there is a third fat …
Costs & performance data?
Interesting article Chris. I've got my money on one of these formats becoming more pervasive into the datacentre for large scaling non x86 systems. Wanna wager? :)
The PCIe concept is intriguing, I can't see it working at a director level though for storage. It's hard enough now to look at throughput on huge servers running multiple partitions at how hard the interconnecting buses are running, there's literally no tooling for some platforms for bus utilisation. I mean a switch has an ASIC for the traffic, the HBA obviously is an ASIC on a board to pull/push the traffic that hard to/from it's bus. Doesn't this mean that the switching kit inside a switch would have to have a lot more throughput?
Extend that out to storage and sans and I'm wondering what tooling they could come up with, though I imagine it'll lose some latency because you'd be negating the need for an HBA or a NIC. Maybe this would work for small hosts running a single HBA or two but we have some lpars running 20+ HBA with a processor footprint probably having 60+ HBA across 10 high speed busses to the processing unit.
Hmmm I need more coffee. My head hurts.
I forgot about CNAs
Is there much in the idea of CNAs? I can't remmeber if Brocade or Cisco have them working yet. They'd take away the need for a NIC and an HBA.... reduction of fifty percent. Admittedly it makes you dependant on FCoE still.
you mean...(prior art)
a start-up is redefining a mainframe... or minicomputer... just with external servers and blade racks?!?!?!
Im pretty damn sure that Big Blue's AS/400 can very happily run multiple instances of virtualised servers (linux based too....) across one set of internal I/O hardware...
So what exactly are they bringing new to the environment?
It's no wonder they have no product... they can't afford the patents or cross licensing needed for a "mainframe", at least not without potential customers saying "then why don't we just by an IBM"!
Why PCIe? If we go that route we might as well look at hypertransport (or its Intel analogue). Makes much more sense. In that case the rack of servers is not just using virtualised IO. It can use both virtualised IO and memory and be an arbitrarily big NUMA system.
Granted, OS designers will scream for a while, but overall, NUMA scheduling, memory allocation, etc have answers now and most OS-es can support it properly. In fact multicore+multisocket or multicore+multithread is a form of NUMA already.
Hasn't IBM already done this
with the POWER5/6 frames and Virtualisation?
We've got over 40 p595's alone with 64 CPUs/1TB RAM each and 4 * D20 I/O drawers with 24 PCI slots each... all shared virtually where possible, or dedicated where extreme throughput is needed. These host LPARs with virtual processors and allocated memory... job done!
Very, very interesting.
If you think about this along with all the other consolidation that's going on, it's almost redefining computers. You've got proposals knocking about these days for shared power supplies in datacenters, shared storage already exists, and virtualization (shared processors) is massive. This however is a whole new level, it's shared i/o, and potentially could apply to pretty much everything else.
If this takes off, each server can potentially be reduced to just processor, memory and motherboard. Absolutely everything else can be centrally provided and shared as needed. Each rack or server room becomes its own little mainframe / blade system.
It's also interesting when you think how well this could work with virtualization. The concept of mapping PCI-e to virtual machines has already been raised. Personally I think PCI-e mapping could be an easy way to provide high performance graphics to virtual machines. If shared PCI-e arrives and shared graphics cards (such as Nvidia's range) take off, there are potentially even higher gains to be had.
So if you think about it, we now have efficiency gains either in place or being talked about for:
- Shared CPU+ram (virtualisation)
- Shared storage
- Shared power supplies
- Shared networking
- Shared graphics
Combine all of those, and you can fine tune every aspect of your companies IT assets, precisely matching your hardware to your requirements. And you can do this while increasing performance since everything is connected in your server room using a local high speed interconnect.
@Justin & Nathan
I think you two are both missing the point. We also run big iSeries Power5/power6 boxes. The whole idea of PCIe direct to the switch would bypass your requirement of an IOP or an IOA.
Unfortunately this doesn't solve the issue that IBM have no tooling (or very little) to help you with HSL performance - i.e Bus speeds to the CEC.
And yes, everything eventually does come back to the bigger platforms that consolidate well - mainframe, iseries etc. As it's (sometimes) cheaper per CPU/Gb/etc to run a big footprint with lots of lpars than multiple small cheap footprints then trying to cram more lpars onto them.
Still have to connect to PCIe switch though
Sure you can rip out the HBA/NIC but there is not a External connector for PCIe for them to plug a cheap cable into each motherboard. You will still need a PCIe Card (a network interface card if you will) that will plug into the motherboard and provide a port for the shared resource to connect all the motherboards (servers) together.
I don't see this as either a energy saver or a big cost saver (especially since these proprietary External PCIe NIC will be more costly than a generic HBA).
However, the article should have focused on the real win here 8000 - 16000MB/s speeds which beats the pants off of even Gig Ethernet by a a fair stretch. The I/O backbone for this rack of servers will be fantastic. This would especially help for a bunch of servers that need frequent or large communications with each other.
I do wonder how the PCIe bus will scale though when shared amongst so many resources.
I know most MB offer more than one PCie 8x or 16x slot, but more than one card is used, many of these motherboards have to actually run at slower speeds due to limitations.
What they are bringing new is the same shared architecture of a mainframe but made up of lower cost generic parts. No need to shell out $$ to IBM for parts and service, just have your local tech slap in a replacement MB, with newer faster CPU and RAM. Replacing proprietary parts with generic market is not new either, but it has been successful when done in the past (this is how Juniper jumped into Cisco's hot tub and crashed the party)
@ Rob Dobs
"""Sure you can rip out the HBA/NIC but there is not a External connector for PCIe for them to plug a cheap cable into each motherboard"""
Actually the PCIe spec does include both external plugs and cables. You wouldn't need anything like a network card, just a board with wires straight through from the PCIe slot to the rear bracket. It may need a bit of filtering gear, but nothing at all like a high speed processor or anything.
I've actually seen some PCIe 4x cables (that's 10gbit, I believe) and they're not all that bulky, though I'm not sure what the max length is in the spec, I hope it'd be able to reach from top to bottom of a 42U.
The real nice part is that loads of hardware already has and will have for a while PCIe busses, which are already designed to operate over some distance. Contrast that with HyperTransport, which is limited to AMD, and not really feasable over any distance longer than a couple inches (maybe 1.5 linguinnis.)
I don't get it. Servers usually have a NIC and HBA each because they tend to max them out. If you start sharing all your servers with just a couple of each, then performance is going to drop drastically, surely? Even if you attach the storage directly to the fabric, you have a lot more speed, and you are going to hit the NIC limitation even faster.
Many servers don't max out their nic or hba so pooling them would be advantageous.
VMware running on blades with a backplane can help, but PCIe is a more open standard than most blade-server backplanes.
Dual nics/adapters are often used for redundancy. By pooling the nic's/hba's you could run N+1 rather than N*2 adapters and still have a nic/hba per server and a spare available (virtualised so any host can access it). For disk access, multiple PCIe buses would be much cheaper to put into a server than multiple infiniband controllers and much more efficient than using nfs/udp/ip.
It seems like a good idea to me, though the thought of some techie pulling the wrong PCIe cable fills me with more fear than having them pull the wrong network cable!
What would make a REAL difference ...
try something more dramatic ... http://www.abidanet.com/
Various PCIe solutions are under development
There are a number of options being developed for extending PCIe outside of the server and switching it, using low-cost 'extender' cards with both copper and lightweight, flexible optical interconnect options. Most of these solutions provide a substantial reduction in cost and power over populating NICs and HBAs directly in the servers, while providing the latency and bandwidth benefits of staying in the PCIe domain.
Traditionally I/O has been way underutilized. Server virtualization is turning that around and leading to increasing I/O needs (more or higher bandwidth NICs and HBAs). But perhaps more important is that I/O is often very bursty, and varies with shifting loads and applications. This means that some I/O can be swamped while other I/O is horribly underutilized. The best way to overcome this is to virtualize I/O, not just across VMs in a single host, but across many hosts, and dynamically assign, monitor, manage and reassign that I/O as required. Using inexpensive, high bandwidth, low latency PCIe connections in every server as the 'in-rack' fabric, interfacing to whatever data center network you prefer (including CEE/DCE and FCoE) is becoming a viable alternative.
it's not just i/o simplification it's i/o management also
This problem has been solved very well and we have gone beyond the i/o consolidation part.
The next step is i/o management and that is key to have a an easy method to deploy and manage i/o.
Readers can get more details at www.xsigo.com
- Boffins attempt to prove the UNIVERSE IS JUST A HOLOGRAM
- China building SUPERSONIC SUBMARINE that travels in a BUBBLE
- Review Raspberry Pi B+: PHWOAR, get a load of those pins
- That 8TB Seagate MONSTER? It's HERE... (You'll have to squint, 'cos there are no specs)
- Review Reg man looks through a Glass, darkly: Google's toy ploy or killer tech specs?