When most of us arrived home with our newly purchased PS3, we couldn't wait to start annihilating aliens in Resistance: Fall of Man or kicking butt kung fu-style in Virtua Fighter 5. Not astrophysicist Gaurav Khanna - he used his to build a supercomputer. Khanna now owns a total of 16 PS3 consoles, all linked together to provide …
xbox 360's XNA kit
Technically, couldn't you write your simulation code (presumably with the cluster/sync code) in XNA? Oh wait, they don't allow network code in XNA. MS screwed the pooch again.
Ignoring all the "good idea/bad idea" flameboi comments, I think this approach is just greedy. 16 PS3s? Please can I have one and the boffins can cope with a 1/16 reduction in performance?
@ the usual suspects
Oh look, the usual suspects showed up to turn this into a console war thread. Guys, the 'war' is over. All three consoles are here and have good games. Anyone peddling the myth that the PS3 doesn't have good games is living in a world of their own making.
@Sarah, Cell BE is very fast on single precision and is still faster than any desktop chip for Double precision. IIRC each Cell at about 3.2GHz hits in the region of 250GFLOPS single precision and 25GFLOPS double precision. Yes, the SPEs are optimized for SP work. However, apart from the most recent quad core (and multiple execution units per core) x86 processors I'm not aware of any existing x86 that comes close to Cell floating point performance. There is a revision of the Cell that further optimizes floating point math vastly improving the performance in double precision work.
@the AC who continues the blather about the PS3 suffering because Cell wasn't designed for gaming. You sir are a fool. Cell BE was designed primarily for the game console market. It was a custom design job by Sony/Toshiba/IBM as a consortium, specifically for the PS3. It so happens that the high performance architecture they created is also suited to HPC. The RSX is not the GeForce 7800, it's a derivation from that architecture. There is additional hardware onboard as well as design revisions specifically targeted at the needs of a game console architecture.
You're utterly wrong to state that something with two or four generic cores would be better suited to gaming. It would be easier on day 1 to use, but it would most certainly not be better suited.
Sony specifically wanted to segregate the memory of Cell and the memory of the RSX. Cell can write through RSX to the GPU memory far faster than the 16Mbps read performance. There aren't many occasions when you are going to want the CPU to read directly from the GPUs memory. The only time when this becomes an issue is when you try to make the PS3 become something it's not. How many times have we ever heard of PC applications where the CPU is reading the GPU's memory directly?
"Typical desktop Linux is a pain in the ass with 256 megs of RAM."
Probably not "typical desktop Linux" at all, eh? I would suggest the boffin in question did NOT just download the latest Ubuntu copy and just use that as is with eye candy and all that. You know, I use a Beowulf Linux cluster here at the lab, and all that has is console access... And consequently does not need to run OpenOffice or Firefox either.
Too bad my type of scientific app would not do with so little RAM (genome assembly and molecular DB searches can be quite hungry in that area). Otherwise, it would be interesting to justify a bunch of PS3s in a project's budget...
The PS4 will have 30 times the supercomputer performance of the PS3
I attended a public talk by a Playstation 3 (PS3) games developer in which he said that future IBM Cell processors would contain hundreds of Synergistic Processing Elements (SPEs). He did this in order to emphasise the importance of using as many SPEs as possible when writing very high performance games.
It is a reasonable assumption that the PS4 will contain hundreds of SPEs. In PS3 Linux only 6 SPEs are available to the user. If I assume that the PS4's Cell Processor will have 200 SPEs then this would imply that the PS4 will perform scientific calculations around 30 times faster than the PS3.
Note that I am assuming that the PS4's Cell processor will be clocked at the same frequency as that in the PS3 to ensure that PS3 games will run on the PS4.
Talk to IBM about their Cell based blades....they *don't* come with a 256MB memory limit.
Absolutely agree with your comment about typical desktop Linux, but I also have to point out to everyone that Linux was once touted as having an incredibly light memory footprint, what has happened there then? Since when did Linux need more than 256MB to manage a simple desktop for users? Ah, you'll be worried about the frame buffer size? 2MP at 4 bytes per pixel is still only 8MB. Even triple buffered that's 24MB. It's not like you're going to want to run texture intensive games under Linux on your PS3. Honestly, people seem to forget that 2GB per desktop is an obscene amount of RAM for a simple desktop. Only in Microsoft's world does it require 1GB to run the OS and a second GB to ensure you have sufficient RAM to run your standard office applications. Windows plus Office used to run very well on systems with 4MB of system RAM and maybe 1MB of video memory. Have we lost the plot that much that we can't run a word processor in less than a quarter gig of RAM?
@ Steve [RE: Cell Chip]
"If cell chips are so good for certain types of calculation, why can't we get them on a riser card and stick it in a PCI-E slot."
..or they could have just used one of these -
(but it was probably cheaper to use the PS3's!)
The key is NOT the processor
I think linking 16 PS3s is a neat trick in this application (maybe we can stick some PS3s in our next budget ;-) ), but several commentators are missing an important point.
The key is the fact that his code runs EXTREMELY well in parallel, with little communication overhead, otherwise the Gigabit ethernet would completely kill the performance. I would be very interested in the speed-up he achieves with respect to running on a single PS3. I work on various parallel computing problems, and would LOVE to see 16 cell processors configured in shared memory formation (with crossbar switch or some other fast interconnect) then my code might really run neatly. As it is I will stick with multiple dual core opterons (Barcelona, where are you?), or the Nehalem type Xeons.
*IF* there is a PS4, and *if* it's based on a new and improved Cell then I would think that the version used will have a 'classic Cell' mode that it can switch to, and the system will downclock if needed.
If there is a PS4 and backwards compatibility is still a feature on the table. Then look for Sony to have substantially upgraded the Cell to allow the box to do real time ray tracing, and the GPU will become a secondary issue. Heck, they might even get away with 4 slightly uprated Cells working together with an RSX for handling the screen. I know that they are talking about a 10 year timescale, but the investment needed to go to yet another custom architecture is astronomical, and there are some serious payoffs for developing the Cell further over the coming 5 years before decisions have to be made.
optimising for PowerPC/Altivec/Cell
forgiving the X86 only centric readers commenting so far its clear that they dont understand the PPC/Altivec or the other vector units on the CELL
its true the PPC/Altivec/CELL can stand some optimisations when running PPC linux ,as the old Mac PPC Coders obviously didnt want to undemine the OSx Altivec optimisations inside that OS.
it strikes me reading the linked page to the cluster page http://gravity.phy.umassd.edu/ps3.html that infact it does appear that he used a generic PPC linux, and didnt bother to look at optimisations outside the compiler.
apparently not even considered using the PPC coders choice, 'PPC Gentoo' were all your current PPC Altivec multimedia optimised code is first produced by the likes of lu_zero and the altivec guys at Power dev.
take a look at the old school PPC guys code thread here http://www.powerdeveloper.org/forums/viewtopic.php?t=1426&postdays=0&postorder=asc&start=0
for a practical tryed and tested code base.
its clear that the PPC linux DOES NOT currently use any Altivec vector optimisations (as the x86 linux with its limited MMX vector unit does).
if you read some of the Altivec threads found at powerdeveloper, you will find some answers/Numbers, and the lads are always looking for feed back on the likes of the freevec optimised codebase http://bbrv.blogspot.com/2008/02/freevec-updated.html ,and helping the new user/PPC programmer better understand the PPC/CELL and their vector optimisations you might find informative and practical.
look at this chart in the thread above for an indication of a generic memcopy/network speedup for instance.
I have looked at the Linux Kernel code a bit.
Its not difficult to improve the performance on PPC.
The Linux Kernel has a copy function which is used to cope between kernel and user space.
As this function copies a lot of data its performance has direct influence on network or filessystem performance.
Improving the speed of it was actuelly easy as you can see:
Especially on the Cell improving this function
does result in feelable performance improvements of the total Linux system.
I'll do some more testing and then publish the patch soon.
Posted: Wed Nov 14, 2007 4:43 am Post subject: Possible benefits - optimization for PowerPC
Mostly all Linux applications are developed in C or C++. People often believe that C compiler are good enough to guarentee good performance. This is unfortunately not the case, especially on PowerPC manual optimization can make a huge difference.
Here an example of a memcpy on PowerPC...
a) Normal C routine working on Byte
b) Normal C routine working on Long (32bit)
c) Normal C routine working on quad (64bit)
** This is best performance that you can archive by algorithm design, using C language **
d) Normal C routine working on quad (64bit) + with two ASM Cache-instruction added.
e) ASM routine better optimized for this PPC architecture
From 150 MB/sec to 2750 MB/sec is quite a difference.
As you can see by using optimized code you can achieve 20 times better performance!
"Probably not "typical desktop Linux" at all, eh?"
Of course not!!
But several people here brought up performance under various circumstances including desktop. I wanted to explain that it's not designed for such a workload, but it's fantastically fast floating point abilities make it fantastically well suited to other workloads.
The comments that it's not suited for games are just dumb. The PS3 is easily the most powerful of the game consoles, even if it's power hasn't been fully realized.
Note that this is different from saying "The PS3 is the winner". In the last two generations of game consoles, the top spec system sold the worst and the lower specced PSX and PS2 kicked ass on the market.
The PS3 is reminding me of a lot of systems, none of them winners on the market.
The Sega Saturn, for example, also had fantastic processing power in a bizarre architecture that would require special investment to take advantage of.
The Apple Pippin too. It was more PC like, and had the same sort of ill-conceived banana controllers that often go with really expensive systems that do things nobody really cares about. (Sony did dump those banana controllers though.)
If developers are going to try to target multiple platforms, that will only make it less likely that a bizarre architecture will be utilized effectively.
You can't really brush this off by saying graphics aren't important because if graphics aren't important, than the Playstation 3 has no competitive advantage against the cheaper to buy, cheaper to develop for Nintendo Wii or X-Box 360.
Yes, I should probably have mentioned that even the official Sony docs suggest that you avoid using that painfully small bandwidth from the CPU to the RSX's memory, and get the RSX to write the data to main memory where the CPU can read it at the normal (very, very quick) speed.
Although, this is discussing the specific case of trying to use RSX memory when the CPU memory pool is already full - at which point there isn't much space to do such a thing. A small 'paging' area would work, somewhat like the old memory pages on the 128k Spectrum, if you're elderly enough to remember working on that.
I don't know the guy's application code, but it might well be that 256Mb is enough anyway. There are plenty of clever things you can do in that space, without needing any more.
"If developers are going to try to target multiple platforms, that will only make it less likely that a bizarre architecture will be utilized effectively."
Lucasarts just announced they'll develop Games for PS3 first and then downport them to X360, didn't you hear?
Which is just another sign that the X360 is going down the drain.. Especially now that the HD-wars are over and BD has won, and that RROD-disaster is *still* going on.. Oh, btw: PS3 has just passed the X360 worldwide minus America -> www.vgchartz.com
Wii passed them long ago, and PS3-Market share keeps growing slowly but steadily! ;-)
Also, there are finally some really good games already out or coming really soon: Burnout Paradise, Motostorm, Uncharted, Resistance, UT3, CoD4 (already here), LittleBigPlanet, Killzone2, GT5 (coming soon)
Okay, Lair and Heavenly Sword weren't as good as expected, and Assassins Creed was merely good, but well, it won't kill the platform! ;-)
And while it may be fun, the Wii isn't included in the regular crossplatform development plans anyway (just not possible, it would "drag down" the graphical quality for the other systems too much!), due to a way different control scheme, target group and hardware that's too weak. Mind you, it's not too weak for fun games, and I'm not saying there won't be great games for it, they'll just be different games than the regular x-platform titles. If that's an advantage or disadvantage remains to be seen, but here this question is irrelevant! ;-)
"You can't really brush this off by saying graphics aren't important because if graphics aren't important, than the Playstation 3 has no competitive advantage against the cheaper to buy, cheaper to develop for Nintendo Wii or X-Box 360."
Gee, i keep hearing this about the cheaper-to develop-for Wii (not X360, when they say this game studios mean primarily content generation, and that is *exactly* as work-intensive as for PC or PS3!). But can anyone finally please explain to me why Wii-games cost EXACTLY the same as PS3- and X360-Games then? What happened to all that "easier to develop, so games will be cheaper"? Is Nintendo raking in all the extra money?
It's a bit like this whole "HD-DVD is way cheaper to produce because we can use DVD-manufacturing lines" BS - Where exactly were HDDVDs cheaper than BDs?
Quote: "But can anyone finally please explain to me why Wii-games cost EXACTLY the same as PS3- and X360-Games then?"
I dunno where you buy your games, but Wii games tend to be 10-20 EUR cheaper than PS3 or Xbox games.
Oh, if I were you I'd get a new keyboard. Yours is dropping smilies and exclamation marks all over the place. It makes you look like a bit of a d!ck...
Highlander believes that if the Cell processor is upgraded in the PS4 then the upgrade will be to clock the Cell processor at a higher frequency and not to increase the number of SPEs. I think that this is unlikely. This is because I understand that doubling the clock frequency of a processor increases the power consumed by a factor of four.
A far better use of that extra power would be to quadruple the number of transistors in the Cell processor chip. This is the approach that intel has take with Core Duo processors where extra performance in gained by increasing the number of processors on the chip and not by increasing the clock frequency.
I extimate that quadruping the number of transistors in the Cell processors chip would increase the number of SPEs from 7/8 to around 40.
According to Moore's law the number of transistors on a chip doubles every 18 months. It is therefore perfectly possible to imagine a Cell processor with 40 SPEs being available 3 years after the manufacture of the original Cell processor.
IBM Cell processor documentation does not specify the number of SPEs which indicates that future Cell processors will most probably include more SPEs.
- IT bloke publishes comprehensive maps of CALL CENTRE menu HELL
- Nine-year-old Opportunity Mars rover sets NASA distance record
- Prankster 'Superhero' takes on robot traffic warden AND WINS
- Analysis Who is the mystery sixth member of LulzSec?
- Comment Congress: It's not the Glass that's scary - It's the GOOGLE