a very tiny one.
(smiley face = a ring with 2 cores & a LLC).
During the coming-out party for Intel's Sandy Bridge microarchitecture at Chipzilla's developer shindig in San Francisco this week, two magic words were repeatedly invoked in tech session after tech session: "modular" and "scalable". Key to those Holy Grails of architectural flexibility is the architecture's ring interconnect …
a very tiny one.
(smiley face = a ring with 2 cores & a LLC).
Intel is singing off the network/telecoms music sheet. Entertaining...
As the old saying goes imitation is the most sincere form of flattery. First singing off the Via notesheet with simple and cheap CPUs and on-CPU encryption (unfortunately added to the _wrong_ chip line), now this.
Funny how once I get to the comments to add my observation that this sounds like token ring that everybody else came here thinking exactly the same.
What does this tell us about network topology in use today.
That said it does sound like they dont use tokens and use gaps instead - also not sure I'd use a train anology when the M25 sounds more akin to what there doing. Still nice to see die shrinking now skrinking 10 year old datacenter racks into one chip *smirk*.
As a token ring veteran from the 1970s, it was the first thing that came to my mind. But does each core have a separate ring adaptor, or does it handle the ring directly?
Just how far ahead of its time was the transputer?
How long before some wag replaces it with an implmenetation of contention-based setup and we end up with "CPU ethernet?"
Mine's the one with a network collision in the pocket.
> a very tiny one.
Yes, but with >1000 wires :)
Before your program crashes, you see the ring!
Which is a good thing. Token Ring was excellent at maximising the utilisation of bandwidth, I regularly put together lab tests with real-world data that showed 4Mbps Token Ring outperformaing 10Mbps Ethernet. The customers never believed it, of course, preferring the "evidence" of the headline Mbps figure. Ho hum.
ISTM that this setup allows essentially unlimited cores per CPU with no scalability limits, which should make for some stunningly cool chips in the future.
Same results for me there. TR had two massive advantages over Ethernet. Firstly the TR NICs got waaay more throughput than the Ethernet ones. Secondly, without the collision/retry overhead you could actually get 4Mbps out of it. It was easy to prove, four machines running early versions of DOOM in deathmatch mode* and chaingunning the f*ck out of each other would collapse Ethernet damn near instantly, 4meg TR would handle this quite happily.
16Mbps TR flattened everything else around at the time.
If I ever needed to convince anyone that TR was the superior option I'd just open the nearest wiring closet and invite the sceptic to disconnect the cable of their choice. That one always used to wind up any Ethernet fanbois within earshot.
*It's a network testing tool. Honest.
Depends on the definition in play...
And promptly obsolete once proper switches came out. Token ring switching turned out to be much more difficult and expensive than Ethernet switching, which was the real reason it failed.
Definitely, in the beginning quite a bit of "100mbps" was really closer to 20-40mbps in reality and prone to collisions and bad drivers, but it was faster than 10 so many didn't care too much, same with early and cheap low-end gigabit this generation. IBM didn't let anyone get away with those shenanigans, but ethernet prices fell every year far more than token ring.
So does anyone remember the Transputer and its Cambridge Fast Ring architecture? Plus ça change...
(Halo just because it's a ring)
I think the transputer was more akin to what the Cell processor is.
It was way ahead of its time and was solving a problem which in the end didn't appear. Early 32-bit processors weren't delivering the speed gains that chip designers had hoped for. But eventually the speeds picked up.
We've a long history of doing great things in this country, they either flop or get sold to the Americans. It's pretty amazing the ARM processor is such a success, although Intel are trying to kill it with their low power x86 chips.
Token ring allows one party on the ring at a time - equivalent to the whole train in the analogy used in the article. This model is only asking if a truck/carriage is full or empty.
Whilst the article doesn't make it clear, I would also expect that the traffic is pulled off the ring at the destination rather than back off at the source (once it has been right round the ring) like token ring. Otherwise it would never get the scalability.
What I would like to know is if the ring is one-way or bi-directional.
Rik (the author) has just confirmed to be that the ring IS bi-directional. This will help with the scalability since maximum latency between any two nodes will now be proportional to (number of nodes) / 2. Assuming that each node (core or GFX) has a unique ID, then a simple algorithm could be used to determine shortest path.
Yes, not Token Ring, but more like packet switching like used in Cambridge Ring and cell phones
I agree with Vyzar with the token ring thing, it is not. Token ring also did global (one could hold the token) and not local arbitration.
Anyway, what strikes me when reading is that in many core parts the latency to other parts increases. Since the bus shifts once every clock, but the number of stations from src to dest increases with the number of parts ?
So a two core without graphics for me please :-)
Surely this will also affect the memory latency too on a core-by-core basis. If I have understood correctly, it is a unidirectional ring. So take a 2-core simplified example:
Assuming a clockwise direction here, latency to core 2 will always be 1 clock more than to core 1. This difference will increase by number of 'stops' on the ring, so you would end up needing to design programs to use a particular core if they were memory latency constrained.
The eight Cell cores, I/O bridge and control core all share a bi-directional circular bus that shunts stuff around between them. Maybe Intel have been taking a peek..
I did a Masters dissertation on ring-based multiprocessor architectures 18 years ago. Nice to see they've caught on at last :-)
RMI (now part of NetLogic) had rings in the same logical location in their multicore XLR CPUs in 2005. The memory interconnect ring joined up to 8 CPU cores, 8 LLC banks, two DDR bus controllers, and an bridge to the peripherals.
Does that graphic suggest that a laptop would have a "DP Port"? Because I'm reasonably sure that DP means 'Display Port,' and that laptops probably don't need a Display Port Port. Or do they?
*=PIN number syndrome
... that wasn't really a thought I had while reading the article. Chip design sounds a)very complicated b)fascinating
"one ring to rule them all" I really couldn't resist
In that case.
If i remember correctly ATI did something similar in their cards before they merged / got bought by AMD. It might be different but here is a link to an old article i found. http://www.anandtech.com/show/1785
I think it turned out to cost way too much in die space and provided much more bandwidth than the cards could use.