43 posts • joined Saturday 7th June 2008 12:39 GMT
Re: "Yeah, Intel. What are you going to do, bleed on me??"
GPGPU and many-core are in very much the same regime; Intel gives you 456 1.1GHz double-precision FP units in 300W, and at a similar price nVidia gives you 832 706MHz double-precision FP units in 225 watts.
Are you sure about the '34 millimetres by 28 millimetres in size' part? That would make the chips five times the size of Haswell, and larger than your average DSLR sensor.
Is 9.6GB/s a typo?
You mention '40ns, 9.6GB/s' for the core-to-memory-controller link, and then say that a system with eight controller chips can have 230GB/s CPU-to-memory bandwidth; which of those figures is right?
Juno has a remarkably pathetic camera (2-megapixel, 7.4-micron pixels, 11mm focal length, 58-degree FOV - better than most smartphone back-cameras but not by much); it's specced for 15-kilometre resolution at the 4300km closest approach of Juno to Jupiter, and has not the slightest chance of getting near enough to Europa to achieve six-metre resolution,
The Galileo probe is currently a thin spread of titanium vapour lightly sprinkled over the hundred-kilometre cloud deck of Jupiter; is the article very old, or should it be referring to ESA's JUICE mission?
Re: Spoon holder...
At least burning PLA has a pleasant milky stench.
The problem with PLA for household goods is that it melts at about 50C; if you stick a 3D-printed thing in the dishwasher it comes out a bit Dali, if you try to 3D-print coasters you find you have made expensive and attractive stick-on bottoms for your coffee cups.
As far as I can see from the paper, it's offering a technique for getting 3N capacity out of five capacity-N discs, with protection against one disc failure in conjunction with two badly-located unreadable sectors.
(whereas RAID6 gives you protection against two whole-disc failures, but if you lose one disc and have unreadable sectors in the same place on two of the others then you've lost that sector)
It seems to involve fifteen reads and five writes per sector write, because it works by looking at groups of sectors on each disc, whilst RAID6 requires three reads (the sector you're overwriting and the two parity sectors) and three writes, so there's a lot more bandwidth used.
Basically this is a paper which has discovered a pretty mathematical pattern, with a dubious justification that it might be relevant for data recovery. It doesn't make sense in a world in which discs tend to fail mechanically rather than to develop individual bad sectors.
"many clusters where latency or cost is more important than bandwidth are still being built with Gigabit Ethernet switches"
Gigabit Ethernet latency is *dreadful*, 180us or more for a ping between two boxes attached to the same switch! You use a gigabit interconnect only when latency is immaterial and bandwidth not terribly important; thankfully a lot of interesting jobs have that property.
Unfortunately the slower grades of infiniband, which were still cheaper and lower-power than 10GbaseT when they started to be phased out, are no longer readily available new.
Computational number theory lab
I have a 48-core 64GB Opteron 6168 machine, an old Core2Quad Q6600 as NFS server, and a Sunfire 4150 dual-quad-Xeon, all installed in my gigabit-ethernet-connected (thanks to a Very Large Drill) outbuilding. I use them to factorise large numbers; by Easter 2^929-1 will have fallen to my ponderous linear algebra machinery.
Re: $50K for the 100M digit prime
Each test at that size takes about ten days on a $1500 computer which uses about $300 of electricity a year; so in a five-year lifetime it does about 200 tests and costs $3000. The chance of success is about one in a million per test, so: yes, it would cost a lot more than $50,000.
Re: Note the time for the GPU vs the 32 core server.
Not quite; the calculation is basically 58 million *consecutive* 3407872-element double-precision complex FFTs. The FFTs can be split among the cores of a GPU or of a multi-core CPU-based system, but it's not embarrassingly parallel in the normal sense of requiring lots of independent small calculations.
Re: Note the time for the GPU vs the 32 core server.
The 32-core server was running a completely different implementation of the large FFT needed to do the arithmetic on such huge numbers, which is not particularly aggressively tuned (in particular, it doesn't use AVX instructions), which is why it was rather slower than a six-core Sandy Bridge using AVX; the idea was to do the calculation using two completely different software implementations and check both got the same answer.
Getting Fourier transforms to run well on a GPU is not at all straightforward, but since doing it allows you to sell thousands of GPUs to people like Shell and Exxon because the work of converting seismic reflection data to 3D images is made of Fourier transforms, nVidia has done it.
Re: Manufacturers should go 5.25" like the ancient Quantum Bigfoot
Current discs offer 100MB/second read rates, so you're saying that if you constantly read over a disc you'll get an unrecoverable error every other day.
This doesn't seem consonant with something like http://www.numberworld.org/misc_runs/pi-10t/details.html ; yes, this lost a lot of time to disc failures, but in an environment where it was running flat-out to 24 separate spindles without redundancy it lost one disc about every four spindle-years.
Re: Not much chance of that
I don't anticipate Apple releasing a TV until they can release a 3840x2160 Retina TV (that is, until Sharp has managed to scale up by a factor 1000 the production rate of the panels they're launching in February 2013).
Being the only people offering convenient one-click access to quad-HD content - yes, this will require a fast Internet connection, a fair amount of in-TV storage, and special negotiation with content providers; the first is ubiquitous, the second straightforward, and the third the kind of thing that Apple is quite good at and in a unique position for - would seem the kind of unique selling point that Apple would like to have.
I would pay $0.99 per half-hour for the BBC Wildlife Film Unit doing what it does best in quad-HD.
Note that the 39.6% rate only applies to dividends on shares owned by highest-rate taxpayers (ie people earning more than $388350); whilst that clearly includes Larry, it may well not include a lot of people who have a thousand Oracle shares in their retirement funds.
For cluster nodes I'm not quite sure why you wouldn't use infiniband; colfaxdirect.com will sell you an 8-port QDR switch $2000 and adapter cards for $600, and it's four times the speed of 10Gb Ethernet.
(they used to do SDR cards, which are equivalent of 10Gb, for $125 with a $750 switch, but those have mysteriously disappeared)
Re: Why are they upset?
The Guardian article on this suggested that they were either paid piece-work or penalised for rejected items, both of which are of course good ways to get the workers and the QC team into an adversarial relationship. Didn't we go through most of this with British Leyland in the seventies?
Actually a class of one
The Soyuz down-mass is 100kg only, the Dragon test flight brought back 660kg, Dragon is specified for three tonnes. The Space Shuttle down-mass was something like twenty tonnes and it would routinely return with about five tons of stuff packed inside a four-ton MPLM.
... also more efficient than our competitor's model with integrated toaster
You always ought to give your competitor the benefit of the doubt in this sort of comparison, otherwise people will question your honesty - having said that the machines are usually compute-limited, why are they comparing to an essentially-HPC setup full of dual-socket eight-core Xeons rather than low-power Ivy Bridge E3-1200 systems?
Re: Very Intel
Being complacent about Intel's competence has never worked out well.
Yes, it still has an x87 - it's an x87 borrowed from the Pentium-90, pipelined but not particularly superscalar. If you want to do arithmetic you use the VPU, if you have some little piece of setup code that desperately needs 80-bit floating point for thirty million cycles then you can run it slowly on the x87 side and the VPU will be briefly power-gated.
This is a function-field-sieve discrete logarithm over GF(3^582); it's asymptotically equivalent difficulty to special number field sieve factorisations, which casual groups have managed to do for 1061-bit (320 digit) numbers. As the Fujitsu paper http://www.nict.go.jp/en/press/2012/06/PDF-att/20120618en.pdf points out, it involved a fair amount of implementation work but no more computing than finding a single DES key.
You've got cores and core-groups confused at the start; you write
The Fermi GPU had 512 cores, with 64KB of L1 cache per core and a 768KB L2 cache shared across a group of 32 cores known as a streaming multiprocessor, or SM
where in fact there is a single 768KB L2 cache shared between all 512 cores, and 64KB L1-like memory shared across each SM.
'The Fermi GPU has sixteen streaming multiprocessors, each comprising 32 cores and 64KB of fast memory, and a 768KB L2 cache shared by the sixteen SMs' would be a more correct way to put it.
Intel thoroughly missing the point here
These are not credible competitors to four-socket Opteron boxes, because they're so enormously more expensive; even if you regard a Sandy Bridge hyperthread as equivalent to an Opteron core, $1611 for eight 2.2GHz SB cores versus $639 for sixteen 2.2GHz cores is a big premium.
The four-socket Opteron boxes are great for HPC-like jobs, my 4x6168 machine delivers 360GFLOP peak for about 700 watts and I've not had to fiddle around with Infiniband cards and switches to connect smaller boxes together.
Last para is rather unclear.
Blue Gene/Q is the energy-efficient one, Hector is the XE6 (enormous pile of Bulldozers).
Blue Gene/Q is in no sense a distributed computing project; it's a collection of cabinets (probably four cabinets) each containing an enormous pile of custom IBM chips each containing 16 PowerPC cores.
Roughly what they've done
This is a weak-lensing survey. The idea is that the shapes of galaxies seen from Earth are changed by gravitational lensing from mass concentrations that the light has passed through on the way; so you produce an enormous sample of galaxies which you're reasonably confident are at about the same, large distance (by looking at their colours in several infra-red bands: 'photometric Z' is the term, Z being the symbol for red-shift), and the map plots roughly the extent to which the galaxy images in each patch of space are elongated.
The galaxies are small and the variations in their shapes are comparable to all sorts of other systematic effects caused by (for example) the presence of the atmosphere, so there are several statistical steps in there, which is why the maps look so blobby; the confirmation is at least in part that the brightest blobs turn out actually to contain foreground galaxy clusters, though a bright blob without a galaxy cluster would be a much more exciting result.
I suspect it's more hope for Jupiter
They haven't got radial velocity data to get the masses, and the paper is in Nature and not open-access, but the more technical summaries suggest that the deep-fried planets are the iron cores of former gas giants.
Lanthanide halides don't behave like UF6
The Kroll process sounds reasonable at first sight, but most of the lanthanides really only go to oxidation state +3 (the ones that don't you can separate out much more easily by taking advantage of that), so I don't see how you get enough fluorines around them to get nice volatile compounds like the transition metal highest-valency fluorides. The lanthanide trifluorides seem to be nice solid things, melting around 1500C (without much range in the melting points, so the distillation's not going to be completely straightforward) and used in glasses for infra-red lenses and optic fibres.
The bromides are a bit more volatile, and probably the iodides even better, but I'd be very worried about thermal decomposition.
You could go to organolanthanides or borohydrides, but coating the reactor with LaB6 seems like the beginning of quite an expensive day, and I'd be impressed if Gd(tBu)3 could be distilled without decomposing.
Bright lights are wonderful
I have 85W-power-usage 450W-incandescent-equivalent light bulbs in my bedroom and my dining room; they're fantastic, and meant I didn't find last winter depressingly gloomy. Obviously I don't have them on 24/7, but they're a great improvement over the 25W-power-usage 150W-incandescent-equivalent bulbs I had previously.
Grades of plastic?
Apple cables are noticeably thinner and bendier than many other manufacturers'; this lets the cables lie more tidily on the desk, and makes them easier to thread through small spaces, but I imagine it also lets a given force bend them in tighter radii than other manufacturers' cables, so allowing more damage to the innards. So I wouldn't be terribly surprised by this story; it's a side-effect of designing for prettiness.
Image stabilisation FTW
Alex said "I have a 300mm (450mm equivalent) zoom on my APS-C SLR camera and have a job getting steady shots with that. You have to have a tripod."
Thankfully, Nikon are now quite good at vibration-reduction - it's the really big advance in lens technology in the last decade, you no longer need to hold the camera still. One of the pictures in the article is taken hand-held at 624mm equivalent and looks pretty sharp; even the pocket Canon camera I have can take sharp macro shots of coins in poorly-lit museums with a quarter-second exposure.
Languages designed after the invention of threading thread better. WWHT?
One of Google's big observations is that Java programming style tends to be very threaded, so idiomatically-written Java programs do tend to spread nicely over multiple cores once you have the multiple cores available. Writing explicit threading in C++ is three kinds of pain, so people are generally not willing to do that, but it's much less uncomfortable in the managed languages which have threads as a primitive.
I have a lot of software that isn't terribly communicative nor terribly memory-intensive, and the advent of quad-cores is wonderful for this; I can run a dozen copies of the software while having only three loud fan-heaters on my flimsy desk. If two six-cores cost less than three four-cores, one motherboard and a case, I'll buy them; if not, not.
what's a factor 1000 between friends?
for 'gigaflop' read 'megaflop' throughout
HPC implies jobs too big for virtualisation to help
If your job is small enough that it can share a computer, you might as well run it on the computer on your desk; once your unit of allocation is the computer (or if you're using Blue Gene at Julich, the rack with 1024 CPUs in it), there seems little point in virtualisation.
Clusters, particularly small clusters which were bought for One Big Job and then sit in the corner unloved, tend to be quite idle, but the right response to that is at the low level shutdown -hp and wakeonlan, and at the high level politics such that your individual research groups find it better to contribute funding to the university, corporate or national Bloody Big Cluster then submit their jobs to that.
40Gflops about as exciting as lukewarm marmalade
Whilst 'a single Sparc64-VII chip can deliver 40 gigaflops of number-crunching power with all four cores running at 2.5 GHz', so can a single Phenom 9850 of the kind that Dabs will sell you for less than a hundred and fifty of your Earth pounds
You can get Google to show you only stuff you can see by selecting 'usage rights: free to use or share' from the Advanced Search menu - this makes searches in the field of chemistry much more useful, since Google has indexed a lot of very expensive journals which serve only to tantalise
c+p fundamentally incompatible with the iphone interface
Cut-and-paste is more than just a software issue - it would complicate the interface hugely. You suddenly have a concept of a selected region, and need a selection mechanism, and I can't think of a sensible selection mechanism using the touch-screen; given the absence of c+p, I assume that neither could Steve Jobs and his merry men.
I've only found the lack of c+p irritating in combination with the trivial problem that touching a phone number only lets you call it, rather than texting it or adding it to a contacts list; that one I would expect to see fixed in a patch release.
GPS and AGPS
The iPhone 3G has both GPS and AGPS - you get a 'target marker' which starts off about the size of a city district and then zooms down to centre on the house you're in; the original one had only triangulation from cellphones.
A small problem is that AGPS requires an Internet connection, presumably to check against Apple's database of cell tower positions, and there's no fall-back to the satellite-finding process that something like a Garmin GPS does; so, if you've moved a fair distance since you last pressed the GPS button and don't have connectivity you don't get a position fix at all. Discovered this trying to watch myself moving along a train line in the depths of Sweden ...
What's wrong with sleaze and guile?
Umm, if it's making up a different private key each time then the fifteen million years of compute work are required once per infected machine, which is clearly absurd; and if not, then all Kaspersky has to do is pay the ransom once then disassemble the code they receive and publish the private key that it contains.
Breaking a 1024-bit RSA key is not impractical because you need fifteen million years of sieving, it's impractical because there's a stage at the end of the operation which requires a computer with some tens of terabytes of uniformly-accessible memory.
- Xmas Round-up Ten top tech toys to interface with a techie’s Christmas stocking
- It's true, the START MENU is coming BACK to Windows 8, hiss sources
- Google embiggens its fat vid pipe Chromecast with TEN new supported apps
- Pic NASA Mars tank Curiosity rolls on old WET PATCH, sighs, sniffs for life signs
- Microsoft: Don't listen to 4chan ... especially the bit about bricking Xbox Ones