hear that
It's the price of vsphere cost on a per-core basis going down again with 22 cores/socket.
The most important info to me in this article was: 22 cores, and socket compatible with V3 systems. That is awesome.
Intel today officially pulls the wraps off its mildly delayed Xeon E5 v4 server processors. These chips follow up 2014’s Xeon E5 v3 parts, which used a 22nm process size and the Haswell micro-architecture. Intel shrunk Haswell to 14nm, and after some tinkering, codenamed the resulting design Broadwell. Server and workstation …
Will it rotate my tires, and check my oil??
Lots of nice features to be sure, but what does it take to fully utilize them in current products? Looks like there is a bunch of work ahead for the kernel/vm programmers to get up to speed, and even then it will need a bunch of texting to make sure it works OK. For normal user programs (Solitaire anyone), the cards will just shuffle faster.
Of course those mining bitcoins hopefully will consume less power in their quest to make zillions.
Quote: "while teasing developers with goodies like posted interrupts, working TSX,"
Surely that should be
"while teasing developers with goodies like posted interrupts, allegedly working TSX,"
Pretty much every Intel chip product of the last decade (and probably longer) has had multiple errata, I suspect most of them found after release. I think claiming TSX is working is a bit premature until it's seen in the wild for a while.
They are both at it. I don't remember which one was first with the idea.
It totally screws up low latency floating point DSP work. I have a 'standard' dual core Intel that performs better than sooper-dooper quad cores of both Intel and AMD, with similar clck speeds and identical OS.
Clock speed is far more important for anything I do than the number of cores or any of the other fiddles.
Sun found that out the hard way with Niagara (and later T3). Looked nice on paper, but in the real world (except for some quite specific workloads) it turned out not such a great idea . T4 was the first one that actually worked fairly well.
"...Sun found that out the hard way with Niagara (and later T3). Looked nice on paper, but in the real world (except for some quite specific workloads) it turned out not such a great idea ...."
The Niagara T1-T3 were niche processors. In that niche they excelled. An article said that a Niagara T1 cpu running on 1.2 GHz with 8 cores, where 50x faster than a Intel 2.4 GHz dual core server. No typo, 50x!!! The workload was about web server, serving many light weight clients.
Today the SPARC M7 is typically 2-3x faster than the fastest x86 cpu and POWER8. It is all the way up to 11x faster than x86 and POWER8. As the SPARC M7 can encrypt data for free, encryption costs 2-3%. Whereas on x86 typically performance will be halved or worse, hardly leaving no horse power over to do useful work on x86 when using encryption.
Here are 25ish benchmarks where SPARC M7 is 2-3x faster, such as SPEC2006, Hadoop, SAP, Neural Networks, Specjbb2005, etc etc:
https://blogs.oracle.com/BestPerf/
(Funny thing is that if you look a bit on that site, you will see that SPARC M7 is twice as fast at SPECjEnterprise2010 than the stated record.
There is Oracle marketing spewing their lies. SPARC M7 goes around claiming performance superiority on a per socket level....They will compare their socket to an Intel or POWER. They won't mention the SPARC M7 socket consists of 32 cores while they will compare it to an Intel server with 10 cores or a POWER8 server with 6 cores. Oracle marketing continuing to purposely misstate the facts.
Processor design hit a MHz barrier years ago, at approx. 4GHz.
If you can't make your workload multicore then you are never going to go faster on electronic semiconductor hardware.
Put your effort into finding ways to use those extra cores, because otherwise you will not get more work done per unit time until there is an all-new type of hardware in town.
"So the fastest is 3.5GHz?"
No, the fastest base clock is 3.5GHz. The fastest core is much better than that - I think at the moment it's about 4.3GHz in the "old" Xeon range. You do have to sacrifice cores and use Turboboost but clock speed isn't dead completely.
I think IBM have a 6GHz chip available, but you may have to change architectures to use it :)
I agree that some workloads require super-duper clock cycles on a single thread. I have such a workload, we would love a 8-10Ghz chip for crunching this particular workload. It can't be run in parallel (well it can but that slows it down), I want a fast clock cycle CPU for one big job we run that takes 30-40 secs every 3 minutes. Shaving that time in two would be worth paying money to me.
My knowledge of CPU design is third only to my knowledge of popular beat music and fashion (are flares still in?) - Could you design a multi core CPU with perhaps one of the cores running at silly speeds and the rest of the 21 cores at the normal 3.4Ghz (or whatever). Not a troll simply wondering aloud and hoping a bloke at Intel comes back and says "Look what we have just for you!"
Thanks
Turboboost is based on thermal envelope for the whole package so yes, it would be theoretically possible to do, and Turboboost does already dynamically change things depending on current workload so in your case while "the process" runs one core could run at 4.2 then when it finishes all six could run at 3.6.
Of course, you could take the easy route and build a server with one 22 core and one 6 core then use various gubbins like NUMA and core affinity to tie virtual machines to the right cores. Or take the really easy route and just buy two servers :)
Not going to happen in semiconductors.
The ~4 GHz limit is due to the physics of how the clock is distributed around the chip.
As process size shrinks, the smaller physical distance between gates reduces latency (linearly), however interference increases (inverse square law) and thus Bad Things happen.
If your workload really can't be done in parallel then you're stuck.
However, it is very unlikely to be genuinely true. Very few workloads are totally serial, and so you can usually find some sections that are independent.
If you find it runs noticeably slower when running in parallel then your architecture for doing it is almost certainly incorrect, and is blocking threads way too often.
At worst it should be slightly slower due to thread context switch.
Surprising how little extra performance they were able to get from that process shrink.
Only 4 extra cores for the same power draw? I guess some of the new features take up a LOT of silicon, or intel threw billions of bucks at a lame process node.
Still, looks like a nice bit of kit, I want one to run Crysis.
This post has been deleted by its author