Every good idea in networking eventually seems to be borged into the Ethernet protocol. Even so, there's still a place in the market for its main rival in the data center, InfiniBand, which has consistently offered more bandwidth, lower latency, and often lower power consumption and cost-per-port than Ethernet. But can …
1 gbit ether seems quite good with Hadoop
This article seems more like a press release from the infiniband marketing organisation than anything else.
1. Hadoop and MapReduce doesn't need 10 gbit/s to the servers, 1 gigabit is fine for all but the backbone.
2. Facebook have just been describing how they use bittorrent to push out OS images as it scales well on gigabit.
3. 10 gbit/s has a worse power budget and is tricker to keep alive, though optical cabling can save on electricity
When you consider that all the big datacentres appear to be going for JBOD storage on the servers with gigabit interconnect, it's hard to say the trend towards SAN mounted storage and inifiniband is unstoppable.
Giga Transfers per Second
..use InfiniBand for peripherals..
IBM do this in their Power6 systems, and have done for years. What are still referred to as RIO-2 (Remote I/O) adapters on these systems (AIX, i, and Linux) that are used to attach disk and adapter expansion drawers are the same adapters used for Infiniband, which is also in use where I work. The two adapters (IB and RIO) for p6 520s have the same FRU and part number, and can be swapped over and work just fine.
They are normally GX+ bus attached (the internal high-speed bus used as a processor memory and peripheral bus), eliminating the PCIe bottlenecks between the server and the I/O drawer, although the adapters in the drawer are normally a PCI variant.
Well IMHO there are some potential pitfalls on POWER6, if you want to plug in infiniband adapter in the GX+ slots and use them for network traffic. On a machine like the 570, in each system unit the first GX+ buss is also connects up to the P5IOC2 chip that drives the IVE and the internal PCI slots. The second slot on the other hand should be able to run flat out.
In the new POWER7 boxes like the 770/780 each GX++ slot, is not sharing anything with the now 2 P5IOC2 chips. So each slot should be able to run flat out with a x12 DDR adapter.
Generally the IO of the 770/780 have had a good upgrade IMHO.
Not that the 570 was bad in any way.
It's 30 years since the launch of Ethernet, we're eagerly watching its competition with InfiniBand.
It is now exactly 30 years since the fanfare launch of Ethernet at the 1980 National Computer Conference in Chicago; there, dangling above exhibits for all to see, a prominent yellow coaxial cable crossed the hall to interconnect Ethernet devices for the first time in public.
Ethernet, an early '70s invention of Bob Metcalfe et al and Xerox PARC, has been a remarkably successful technology which so far has been resilient to all challengers--FDDI, Token Ring etc.--principally because its Carrier Sense Multiple Access/Collision Detection (CSMA/CD) works so well, moreover, it scales well as its speed increases--or it has up to 10GHz.
That said, and although I've been an Ethernet devotee for years, it's always seemed to me the overhead and latency of CSMA/CD would ultimately be it's Achilles' heel, thus it'll be interesting to see whether this resilience will continue at extreme switching speeds, and hence effectively complete there with InfiniBand and other newer technologies.
CSMA/CD is so last week
It's not in use on full-duplex links, and while theoretically still possible, there's not a single commercial gigabit ethernet adapter that'll run half-duplex in existence. Beyond that? Don't be silly.
To me ethernet always had the big advantage of being far simpler than its main competitors, but that is its greatest weakness as well. An MTU of 1500 was a cost-reducing measure, and while a reasonable tradeoff then, it's much less so now. And we have plenty trouble shaking it off. Also: QoS. Instead of offering support (like VG did), ethernet requires layers above it to reinvent it (badly).
While without cheap networking the world would look... different, we are being hampered by the limitations of our previous tradeoffs. For a thought experiment, suppose we would have had 10Gigabit Arcnet now. Discuss.
In an Infiniband discussion you guys confuse current Ethernet!
There is no CSMA/CD in any Point-to-Point full duplex or switched Ethernet, nor for over 5 years - so it has been gone in most deployments for a long time, and was last seen with hubs.
Jumbo frames - part of the 1GB spec which is also ages old have been approx 8000 for years now, but again many shops do not implement.
The big difference with infiniband, is that you can work within the seven layer ISO model, and switch the Ethernet and infiniband electrical layers quite easily at a technical level, it is cost (largely infrastructure investment) that works against that.
There is also a large body of people who confuse ethernet (lower layers) with the TCP/IP related higher layers which are quite portable.
MTU of 1500? That's so last century....
(Actually I think you mean frame size, not MTU, but that's a bit hair-splitting)
Most Gigabit implementations use Jumbo Frames these days, mostly at 9000 bytes, but there's no reason at all why they shouldn't be bigger.
@NoSh*tSherlock! No I'm not talking about newer Point-toPoint full duplex etc.
No I'm not talking about newer Point-toPoint full duplex etc. For starers, cable (time domain) timing limits would run out using CSMA/CD. What you are essentially talking about is not what I'd call Ethernet (as its fundamental workings are different and thus the defintion should also be different).
Again, having not seen the 100Gb/s spec I'm blind and making a lot of assumptions. You're right, it doesn't make sense to me to use CSMA/CD in anything that would complete on a similar level to Infiniband.
This is a nomenclature issue and it's misleading. Perhaps El Reg should not have made the following comment (not at least without clarification):
"Now the race between InfiniBand and Ethernet begins anew. As El Reg previously reported, the IEEE has just ratified the 802.3ba 40Gb/sec and 100Gb/sec Ethernet standards, and network equipment vendors are already monkeying around with non-standard 100Gb/sec devices."
Ethernet scales from desktops to high-end servers, I don't see infiniband taking that real-estate any time soon. Unless, of course, the infiniband Alliance would be willing to take a hit for a few years to promote the technology (like what Sony was willing to do with bluray).
The question is, would we see optical networking into the lower end of the market?
some datacentre limits
There's a couple of limits with ether in datacentre (which doesn't mean I'm not a fan of it)
1. Latency is still pretty bad; you need to design your code expecting data to take a while to get to the far end and back again.
2. The network ops team want's to manage the ether, specify the switches, etc. But in a big single-app-cluster running, say, Hadoop, that ethernet is part of the infrastructure of the cluster. The switches do make a difference, and if somehow they don't work as expected, your cluster performs below expectations -or worse, routes stuff around the world.
...to 4000 MTU on ethernet? Wasn´t that called "big packets"? Only tinkerers can get it done, or... am I mixing things up? Memory fails. Miserably.
Anyway, up and away. The more bandwidth for the same or lower cost, the merrier.
Yes, it does look like a colored thick-carton-glossy-paper brochure ad.
Re: 4000 MTU
It's already here. Possibly it involves enabling a setting on the network adapter properties. You can bump the packet size still further. 8K and 16K are not uncommon. However, IP imposes an upper limit of 64K.
IB to "outpace" eth... ? Umm, sure... in backends, perhaps, nowhere else.
Seriously, what do you mean by "outpacing" here? If I am correct it's about theoretical maximum speed, nothing else - but how is it stopping eth's "unstoppable force"?
Now almost every storage and server box comes with 10GbE option and I give it only a year or two to see integrated 10GbE onto every server boards, further compounding its lead in the server room.
IB has advantages, of course but its few and between and max speed isn't really a problem nowadays and eth adapters are still cheaper...
..that here is no reference in this article about Mellanox making public some of its Infiniband technology in the hope that it could be included in the next ethernet standard, considering El Reg wrote about that subject less than a year ago.
Single System Image
"When you consider that all the big datacentres appear to be going for JBOD storage on the servers with gigabit interconnect, it's hard to say the trend towards SAN mounted storage and inifiniband is unstoppable."
Agreed. But, those are not computing clusters. When I first heard about Infiniband years ago, it was used by SGI and Cray when they were going through the pain of finding it was no longer economically feasible (due to tightening electrical tolerances among other reasons) to build huge single system image computers.. that is, some huge box that could take like 4000 processors and give them all direct access to one huge pool of memory. For weather modelling, hydrological modelling, among other things, they can "thread" the job but each thread wants access to the entire data set and frequently communicate with each other, so a conventional computing cluster is fairly useless. Enter infiniband -- it was (and is) fast enough that they could build what appears to be a single system out of physically seperate units connected on an infiniband network, it was not QUITE as fast as them being directly connected to memory but close enough. In reality I think this is still largely what it is used for.
depends on your app
I agree, there are some apps that don't have good locality. But at our last local Hadoop workshop, ( http://wiki.apache.org/hadoop/BristolHadoopWorkshopSpring2010 ) the physicists expressed concern that their IBM GPFS filestore runnining on infiniband was only 50 TB of highly available storage, which Hadoop could support with 25 6HDD servers, assuming a block replication factor of 3, giving you good redundancy and lots of places to run the code, and if you added a few more servers, you could go way above GPFS' scale.
The price for that consistent-performance filesystem is pretty steep, and it comes at limited (or very expensive, its the same thing) capacity.