At Intel's developer shindig last week, chippery engineers spent a goodly amount of time conducting tech sessions that detailed the company's upcoming 4th-generation Core microprocessor architecture, code-named "Haswell." We thought that you, inordinately intelligent and tech-savvy Reg reader, might enjoy a deep dive into their …
I guess there's an anorak for everything.
Re: frogsaustralia.net.au? really?
It's Australia. Thats site probably counts as an H&S one, along with tinydeadlyspiders.net.au and dropbears.net.au
re. Skin Tones
Just wondering, what happens if you have a picture that includes pale skinned people AND dark skinned people (like in a production with talking heads, (or bed-action bodies), etc).? How does it know what a 'skin tone' really is?
Re: re. Skin Tones
Dunno about skin-flicks, but for talking heads you can just run facial recognition software to identify where the people are and then run the smoothing algorithm on each face independently using that face's own skin tone.
Intel is certainly trying to do a really big sales job for Haswell. It makes me wonder what they are hiding?
Software decently written?
"The key here is that most software is actually fairly decently written"
He must be joking...
Pentium 4 was deemed as having very poor performance because to take advantage of it, software needed to be "fairly decently written", and compiled with a decent compiler. The problem is that to date there is only one compiler worth a damn for x86 - Intel's own (ICC).
I did some performance testing a while back:
Clock-for-clock, with crap compilers (GCC, PGCC) Pentium 4 is about 40% slower than Pentium 3. But with ICC, Pentium 4's performance actually goes up by 20%, clock-for-clock, compared to a Pentium 3.
It's not just down to software being decently written (which it isn't a lot of the time) - it's also down to the compiler doing a decent job (which most don't). On one hand, one could argue something along the lines of: "Pentium 4 didn't suck - you were merely too stupid to use it properly." Unfortunately, this is way, way beyond the average consumer to either understand or do anything about and it is the consumer's perception that decides whether a product is going to be a success or failure.
That may or may not be a fair test for Linux, but I imagine an Intel spokesman talking about "most software" would have a load of Windows or Mac apps in mind, and the range of available compilers is rather different.
Re: Software decently written?
"Most software" surely means how stuff is written by app developers, rather than which compiler they use. I figured he meant that apps that use busy-waits, polling, and the like, won't see much benefit from the new sleep state because they won't often sleep. Where-as apps that do things "properly", waiting on locks, using push etc, and therefore tend to be inactive while they are waiting for other activities to complete: these apps allow their respective cores to go into a low power state more often.
Whether it's true that most software is well-written by this criteria I don't know, but presumably Intel do, because they'll have measured it. This is the sort of thing that apps written for mobile devices emphasis. Part of the motivation for the WinRT API was to promote a more asynchronous style of code than Win32.
"We thought that you, inordinately intelligent and tech-savvy Reg reader, might enjoy a deep dive into their handiwork."
Have you read all the comments on the register? :P
Holy wall of text, batman!
Lots of details, very nice. The usual flashy slides, containing far too much detail per slide, also as usual. The things will be lots of wonderful, intel[tm] brand for double the goodness, sure. The prices can be expected to be set like they always are, just so, no news expected there. So. I really have but one question, and it's not answered: Socket? Will we have to swap out system boards too, again, or what?
Short answer, yes. Long(ish) answer, see text.
"Haswell will use a new LGA1150 socket instead of the LGA1155 socket that Sandy Bridge uses and Ivy Bridge will also use when it tips up next year. This means that to upgrade to Haswell, you will need a new motherboard featuring the LGA1150 socket."
Though it has to be said that the main advantages at the outset appear to be in mobile devices (which I presume we are unlikely to start spannering ourselves :P) and by the time (2014?) those chips are launched for desktops and the like it will probably be time for an upgrade anyway. :)
So skintones are subtly moved to paler and smoother... is there perchance a hidden message there?
Re: that hidden message
Darker-skinned folks faces are OK to start with.
Wait ... what?
"Intel CEO Paul Otellini has called "the third pillar of computing," security – the other two pillars being energy efficiency and internet connectivity."
This twat is a marketard, not an engineer ... he wouldn't know the difference between ones & zeros if he got 'em under his carefully manicured fingernails. He's part of the (current) reason Intel is heading for the bit-bucket.
By way of reference, the real three pillars of computing are memory, IO, and CPU ...
Re: Wait ... what?
If Intel are heading for the bit bucket, when they get there they'll find that AMD arrived long before they did. Which begs the question as to who you think will be the new Chipzilla in the future?
AVX2 on integers
Perhaps this is a play to host the next generation of buggy apps that lose market trading firms trillions of dollars in a single market session?
Re: AVX2 on integers
The AVX2 (long long) integer operations will be in there for cryptographic processing.
The really interesting bit is the transactional memory TSX extensions (IBM’s is already well along the curve). TSX should be a big kicker for TP and HPC, but writing software to take advantage of it is going to take a big paradigm shift away from Garbage-collection to Bedouin memory management
Re: AVX2 on integers
What is "Bedouin memory management"? A google search turned up nothing that seemed relevant.
Re: AVX2 on integers
Yup, TSX is IMO the most interesting thing about Haswell. There is some indepth analysis here Analysis of Haswell’s Transactional Memory , worth a read. Until now, only very few and very expensive processors support transactional memory in hardware (it can be done in software, but from performance PoV it rarely pays off). Meanwhile, compiler support for transactional memory is mostly in place in gcc versions 4.7+ and the work is under way to put language support for TM in the next version of C++ (some time before 2020, hopefully).
Basically this means opening a way towards third type of data synchronization, next to explicit mutexes and atomics. Of course TM comes with its own traps and issues (e.g. IO, exceptions) and it is not "free lunch", but as long as "average programmer" is able to try and learn it, things are going to start look interesting. New generation of CPU in his desktop computer, or in server in nearby plant, is definitely going to help here.
Re: AVX2 on integers
Bedouin is much more succinct than Parallel Generational-Copying Garbage Collection with a Block-Structured Heap
Where Java oriented Garbage Collection takes a Ghetto approach of squashing objects into low memory to minimize working-set, Haskell (specifically, but also other functional languages) take a "bedouin" approach to cleaning-up immutable objects.. the process memory foot-print is larger, but the working set remains small.
Haskell doesn’t get much choice about how it uses memory; but the kicker is that you have lower contention and can take advantages of Transactional Memory which the Java –Ghetto cannot.
By the time TSX is mainstream we’ll be using 1Tb DIMMs and won’t need to pack unrelated objects into the same memory pages.. Bedouin Memory Management
Floating point standards?
"This single-rounding capability improves mathematical precision."
Aren't floating point implementations subject to algorithmic standards? If different hardware versions produce slightly different results at the same precision - then the end results of an iterative calculation can be significantly different. It was a problem in the 1950/60s for some types of scientific work moving between different platforms.
Re: Floating point standards?
If your algorithm depends *that* sensitively on rounding errors then you need a new algorithm. (Or more likely, you need to stop altogether because you are trying to claim a level of precision that simply isn't present in your input data.) Of course, that also means that Intel shouldn't be claiming that single-rounding is important.
Re: Floating point standards?
Most compilers offer CPU specific floating point optimization flags that offer a choice between full IEEE FP compliance (for accuracy) or slightly faster performance by using some short cuts that can induce precision instability. If you optimize purely for speed, it is not at all unusual for your floating point calculations to yield slightly different results on different classes of CPUs (e.g. x86 vs. PowerPC).
I guess you don't care if your results are correct
This is absolutely untrue. Multi-precision floating point algorithms in particular rely on precise adherence to the IEEE spec, and they are the only algorithms that can quickly produce correct results for e.g. sign of a tetrahedron's volume. Computational geometry fails badly when errors of this sort occur. A slight rounding error here can lead to a topological inconsistency there, and suddenly your entire data structure is messed up. You seem to be under the misapprehension that the experts working in these fields solving these problems are idiots, and should just write simple & obvious code that breaks under certain inputs, then blame the inputs! Let's just hope it doesn't crash a plane or cause a fatal dosage to be introduced, eh?
However OP seems to be under the impression that the CPU can take an FADD and FMUL and internally fuse them to an FMA. Unless FMA has a round-twice mode, it can't, because it's wrong. You need to code the FMA explicitly (which is sadly impossible in most languages without using intrinsics, inline assembler, or library calls) or permit the compiler to fuse it (which is a recipe for disaster).
The best thing about FMA is this:
If you do MUL A,B -> C on an FPU, the result is incorrect (rounded).
But if you do MUL A,B -> C then FMA A,B,-C -> D, then C+D contains the *exact* result.
This is the basis for some rather nifty "multiple-precision floating point" tricks which allow effectively infinite precision (up to range) for floating-point calculations. This is much faster on modern hardware than the old way of using integer multiplies for this.
Re: FMA FTW
Perhaps we are talking at cross purposes. If you are issuing FMAs at carefully chosen moments to manipulate the least significant bits of your floating point values, you are writing the very small proportion of numerical programming that is capable of delivering exact results. I'm well aware that this can be done if you have assembly-level control over the rounding of your primitive operations. You are also probably writing a support library for the use of someone who will not have to care about these things.
However, the vast majority of numerical algorithms deliver results whose imprecision depends on the input data and is considerably greater than the machine precision. If one of *these* algorithms gives qualitatively different results when you switch hardware or let your high-level language compiler optimise a little, the cause is almost certainly an ill-conditioned problem rather than the hardware or the compiler.
The authors of such code still have *some* sanity expectations from the underlying arithmetic, but typically don't have specific requirements on the accuracy of operations. Rather, it is stuff like "(a+b)/2 should not be outside the range [a,b]" or "if a>b then I can divide by a-b without getting a divide by zero exception". There were some arithmetics in the 50s and 60s that failed at this level but IEEE took some care to eliminate them.
Re: FMA FTW
The FMA result could only be *exact* if your functional unit had twice as many bits as your data storage registers. Hasn't the Intel tradition been to have 25% more bits? (80 bits vs 64 and so on.)
What I'd like is a fused multiply sum into an expanded "hidden" register. Gimmie the dot product of these two ten dimensional vectors and then round down the results when you store it, etc.
Did I hear a rumor about "moving the VRM on-chip" ?
A few days ago, there were rumours that Haswell would have more say in how its input rail power is regulated - beyond the VID pins of today. Some even suggested that the VRM would move into the CPU package. Anyone knows further details? This would be more of a revolution than skin-tone enhancements...
it's all handled in hardware combined with firmware
Is he talking about ACPI? If so, Windows-only power management then
- Ex-Soviet engines fingered after Antares ROCKET launch BLAST
- Review Pixel mania: Apple 27-inch iMac with 5K Retina display
- NASA: Spacecraft crash site FOUND ON MOON RIM
- Hate the BlackBerry Z10 and Passport? How about this dusty old flashback instead?
- Google's Mr Roboto Andy Rubin bids sayonara to Chocolate Factory