a very tiny one.
(smiley face = a ring with 2 cores & a LLC).
During the coming-out party for Intel's Sandy Bridge microarchitecture at Chipzilla's developer shindig in San Francisco this week, two magic words were repeatedly invoked in tech session after tech session: "modular" and "scalable". Key to those Holy Grails of architectural flexibility is the architecture's ring interconnect …
Intel is singing off the network/telecoms music sheet. Entertaining...
As the old saying goes imitation is the most sincere form of flattery. First singing off the Via notesheet with simple and cheap CPUs and on-CPU encryption (unfortunately added to the _wrong_ chip line), now this.
Funny how once I get to the comments to add my observation that this sounds like token ring that everybody else came here thinking exactly the same.
What does this tell us about network topology in use today.
That said it does sound like they dont use tokens and use gaps instead - also not sure I'd use a train anology when the M25 sounds more akin to what there doing. Still nice to see die shrinking now skrinking 10 year old datacenter racks into one chip *smirk*.
Which is a good thing. Token Ring was excellent at maximising the utilisation of bandwidth, I regularly put together lab tests with real-world data that showed 4Mbps Token Ring outperformaing 10Mbps Ethernet. The customers never believed it, of course, preferring the "evidence" of the headline Mbps figure. Ho hum.
ISTM that this setup allows essentially unlimited cores per CPU with no scalability limits, which should make for some stunningly cool chips in the future.
Same results for me there. TR had two massive advantages over Ethernet. Firstly the TR NICs got waaay more throughput than the Ethernet ones. Secondly, without the collision/retry overhead you could actually get 4Mbps out of it. It was easy to prove, four machines running early versions of DOOM in deathmatch mode* and chaingunning the f*ck out of each other would collapse Ethernet damn near instantly, 4meg TR would handle this quite happily.
16Mbps TR flattened everything else around at the time.
If I ever needed to convince anyone that TR was the superior option I'd just open the nearest wiring closet and invite the sceptic to disconnect the cable of their choice. That one always used to wind up any Ethernet fanbois within earshot.
*It's a network testing tool. Honest.
And promptly obsolete once proper switches came out. Token ring switching turned out to be much more difficult and expensive than Ethernet switching, which was the real reason it failed.
Definitely, in the beginning quite a bit of "100mbps" was really closer to 20-40mbps in reality and prone to collisions and bad drivers, but it was faster than 10 so many didn't care too much, same with early and cheap low-end gigabit this generation. IBM didn't let anyone get away with those shenanigans, but ethernet prices fell every year far more than token ring.
I think the transputer was more akin to what the Cell processor is.
It was way ahead of its time and was solving a problem which in the end didn't appear. Early 32-bit processors weren't delivering the speed gains that chip designers had hoped for. But eventually the speeds picked up.
We've a long history of doing great things in this country, they either flop or get sold to the Americans. It's pretty amazing the ARM processor is such a success, although Intel are trying to kill it with their low power x86 chips.
Token ring allows one party on the ring at a time - equivalent to the whole train in the analogy used in the article. This model is only asking if a truck/carriage is full or empty.
Whilst the article doesn't make it clear, I would also expect that the traffic is pulled off the ring at the destination rather than back off at the source (once it has been right round the ring) like token ring. Otherwise it would never get the scalability.
What I would like to know is if the ring is one-way or bi-directional.
Rik (the author) has just confirmed to be that the ring IS bi-directional. This will help with the scalability since maximum latency between any two nodes will now be proportional to (number of nodes) / 2. Assuming that each node (core or GFX) has a unique ID, then a simple algorithm could be used to determine shortest path.
I agree with Vyzar with the token ring thing, it is not. Token ring also did global (one could hold the token) and not local arbitration.
Anyway, what strikes me when reading is that in many core parts the latency to other parts increases. Since the bus shifts once every clock, but the number of stations from src to dest increases with the number of parts ?
So a two core without graphics for me please :-)
Surely this will also affect the memory latency too on a core-by-core basis. If I have understood correctly, it is a unidirectional ring. So take a 2-core simplified example:
Assuming a clockwise direction here, latency to core 2 will always be 1 clock more than to core 1. This difference will increase by number of 'stops' on the ring, so you would end up needing to design programs to use a particular core if they were memory latency constrained.
If i remember correctly ATI did something similar in their cards before they merged / got bought by AMD. It might be different but here is a link to an old article i found. http://www.anandtech.com/show/1785
I think it turned out to cost way too much in die space and provided much more bandwidth than the cards could use.
Biting the hand that feeds IT © 1998–2019