Yowser
12Gb on a graphics card.
Can someone convert this thing's likely real-world performance to units I can understand, like approximate number of i7 CPUs for general computation or something?
Nvidia has released its most powerful – and, in our memory at least, its most expensive – GeForce graphics card ever, the GTX Titan Z, which you can have for a hefty $2,999. Nvidia GeForce GTX Titan Z Five thousand, seven hundred and sixty CUDA cores of 'cool and quiet' GPU power That's not such a high price, said Nvidia …
IRC a Core i7 reaches around a hundred GFLOPS or so. So we are talking about a facor of roughly 80.
Imagine the power supply and cooling needed to run 80 Core i7 at top speed!
That said, not all tasks are suitable for GPU processing. Also there is a good reason, why they talk only about single precision performance. Most GPUs sold on graphics cards, have significantly lower performance for double precision calculations. One of the reasons why AMD cards were more popular for GPU computing tasks is that they were less restricted regarding double precision calculation. I'm not sure, if this is still true, however.
I think the difference is that a lot of the datasets that researchers work with are structured in an inherently parallel fashion — the expensive stuff is purely functional individual results from each of millions or billions of data points. It's the classic CPU versus GPU thing: CPUs are good when you want to do any of a very large number of variable things to a small number of objects, GPUs are good when you want to do a few very exact things to a large number of objects.
So OpenCL or CUDA just naturally get radically better performance than a traditional CPU for a certain segment of users, and those users are the niche at which this card is targeted.
> IRC a Core i7 reaches around a hundred GFLOPS or so
I've long lost track of cpu architectures (they're 'fast enough' for me to now not care) so I'm struggling to see how the above is possible. At 3Ghz that's about 30 flop/cycle. How?
I suppose if you mean per processor rather than per core, and you have say 6 cores then that's ~5 flop/cycle. If that was a multiply-accumulate (fmac) heavily pipelined over 2 vectors of effectively infinite length then that's 2 cycles/flop, say you have 2 such units that's 4...
I'm struggling to see how even a peak of 100 gflops can be reached, never mind in practice - anyone shed any light? TIA
> At 3Ghz that's about 30 flop/cycle. How? If that was a multiply-accumulate (fmac) heavily pipelined over 2 vectors of effectively infinite length then that's 2 cycles/flop, say you have 2 such units that's 4...
Dual issue vec4 SSE = 8 per core per clock, add in a couple of non-SSE fp32 instructions = 10 per core per clock. 4 cores = 40 per clock.
quote: "(Assuming you mean really 12Gib, 12 Gibibits, as RAM is never measured in Gigabits or Gigabytes, even though that's used in slang terms)."
*ibi as a differentiator has only recently been adopted though, and back in the 386 / 486 era all storage (RAM or non-volatile) used SI prefixes (Kilo-, Mega-) for size. I don't think I actually encountered a deliberate use of the *ibi prefixes until this century, and probably less than 10 years ago to boot. I have never seen (or purchased) SIMMs advertised in KibiBytes or MibiBytes, only KiloBytes or MegaBytes.
I remember buying a matched pair of 16MB SIMMs for a previous gaming rig, for the princely sum of £300. That'll be showing my age there, I think... ^^;
quote: "I'll see your 32mb SIMMs and raise you a 256kb DIP RAM chip for a Trident video card"
I got into PCs pretty late, although I did buy a 3rd party 512kB RAM upgrade "card" (some soldering required) for my original Atari ST, that was so well designed it caused a case bulge when mounted as per the instructions ^^;
"12Gb on a graphics card."
If I understand this correctly, this is 2 Titans in SLI on a single card, therefore 6GB memory per graphics processor (same as a normal Titan.) Memory can't be shared between processors unless Nvidia have done something very funky, so it's effectively a 6GB card as far as frame buffer etc. goes.
12GB sure does sound impressive though!
See https://en.bitcoin.it/wiki/Mining_hardware_comparison
It's a bit out of date now, but anyway, nobody's really mining btc with graphics cards any more. ASIC hardware is several orders of magnitude faster per $ spent. So yes, the more economically viable approach would be to use the hardware to brute force someone's password and steal their btc :)
BTC mining on anything other than ASICs is dead as of last year. Scrypt currency mining on GPUs is just about viable but the leccy consumption will eat a lot of your coins. I mined about £100 of various things (when converted to BTC and then GBP) in 6 weeks on a pair of GTX670s before giving up.
Scrypt ASICs are going to appear this year so GPU mining will be dead before Pascal hits the shelves.
This post has been deleted by its author
It is amazing what can be done with "cheap" HPC systems. I do note that this kind of architecture is not that good at the type of compute load I tend to work on: heavily data-driven processing order. GPUs still prefer SIMD-like problems of "lots of the same" type seen in many physics problem. Alternatively, I need to rethink my algorithms, but at the moment the fastest kit we use for our problems is a 2U rack server with 64 AMD cores and 512 GB RAM.
As you might guess from the above, 12GB is also too small for most of my data sets, and we still haven't quite worked out how to do our work in distributed-memory machines. It would be great if we could find a way to harness these beasts for our kind of work, however
From my point of view it makes the Intel MIC look expensive, so perhaps the pricing these adapters (AMD,NVIDIA,Intel) will start to converge on a sane pricing model.
I must say however, that the CUDA infrastructure gives a "lightbuld" moment when working out how many molecules we can get into this card....
P.
If I wanted a supercomputer, I'd be more concerned about its 64-bit floating-point performance. Nvidia does make special parts aimed at the supercomputer market instead of gamers wanting high performance video, but this doesn't sound like one of them. The Tesla K40 is roughly comparable to the original Titan, but it costs $5,300, not $999.
This post has been deleted by its author
"Titan Z doubles the Titan Black's 2,800-core GK110s to a total of 5,760 CUDA cores, doubles the memory to 12GB, and doubles – as you might imagine – the memory bus width to two 384-bit channels."
...and then you can use it to play games that are still designed for computers with 512mb shared between the CPU *and* GPU and less CPU stomp than your phone. Hell, even the PS4 - which is presumably going to defacto limit the state of the art in gaming for another ten freaking years - only has 8gb of *total* memory. Think about it: Ten years from now, games are going to be gimped to a spec that was around 1/16th the capacity of PCs when it was *new*. Is it too late to start drone strikes against console manufacturers?