back to article Monday: Intel touts 28-core desktop CPU. Tuesday: AMD turns Threadripper up to 32

AMD this week promised to ship 32-core Ryzen Threadripper 2 processors in the third quarter of 2018 – one day after Intel bragged about a forthcoming 28-core part. On Monday, Intel, the dominant CPU maker in the worlds of desktop and server systems, touted an upcoming 28-core Core X-series aimed at workstations, gamers, and …


  1. Nate Amsden Silver badge

    where's the innovation?

    Both of em are just tweaking at most server cpus to run on workstations. I have a dual socket HP opteron workstation maybe from 2009. Bought it refurb from HP maybe 2012. Upgraded the cpus from 4 core to 6 core about two years ago(12 total cores), after finally finding cpus that were a decent price. The cpus were specifically for HP blades. I just discarded the blade heatsinks and reused what the workstation already had. Nothin new here. I don't use it for much anymore but it's still a pretty solid system.

    I've seen people claim the new Ryzen chips has forced intel to compete more. I don't really see that either myself. Ryzen fell far short of my own personal expectations on power usage anyway (not that intel is much better now). Sad to see seemingly everyone running into manufacturing walls relative to the past.

    Where AMD forced intel to innovate was when intel came out with the core series architecture.

    1. DougS Silver badge

      Re: where's the innovation?

      What sort of 'innovation' did you expect? Just because you want something doesn't mean it is going to come (whatever it is you think they should have 'innovated')

      1. Nate Amsden Silver badge

        Re: where's the innovation?

        I wasn't expecting myself, I was commenting on the article:

        "[..]what matters is that someone is putting pressure on monopoly giant Intel, forcing it to innovate in the desktop "

    2. sharkando

      Re: where's the innovation?

      Where is the innovation?? The innovation is Infinity Fabric. Innovation is the 12nm node. Innovation is in the design of the intelligent decoder and branch prediction. Innovation is in providing 128 PCIe lanes. Innovation is making it all affordable. not

      There'sch innovation from Intel unfortunately given that they still use a monolithic design that ends up being unaffordable to the point that the 28 core demo you speak about was actually a scam using a Xeon platinum Skylake cooled with a $1000 Sub-Zero chiller.

    3. rav

      Re: where's the innovation?

      Not everybody is satisfied driving a Prius. Some folks do want a Maserati or Shelby Cobra.

      As for me, I will build one for computer chess. I can't wait to get my hands on TR2.

    4. rav

      Re: where's the innovation?

      Actually the innovation is 4 - 8 core 16 thread dies rather than a single massive and very expensive die.

      The innovation is also infinity fabric. Then there will be 7nm TR3!!!

      TR1 is now about $500. I expect TR2 to be released at less than $1200.

  2. TReko

    Gimme speed

    I don't want more cores on a workstation, I want fewer cores that I can clock higher.

    Most tasks are hard to parallelise - Gimme a 10GHz CPU.

    1. VeganVegan

      Re: Gimme speed

      That’s the case for most users, but I have genetic analyses that are highly parallel, so more cores scales better than more clock speed.

      For example, one of my jobs would typically take a month or so, using all 24 threads of a 12 core machine, running 24/7. I would love to have a 48 thread machine, because it should almost halve the time of the run. It is much harder, and gets into impossible territory, to scale clock speed in a similar way: A 2.5 GHz chip could possibly be sped up to 5GHz, but 10 or 20GHz? I do appreciate higher clock speeds, it’s just that more cores gives me more bang for the buck.

      When you get down to the basics, a GPU is a massively parallel chip, because much of the graphics task can be nicely parallelized. One can imagine (hope?) that other common tasks can better take advantage of getting paralleized to the extent possible. Modern OS’s already generate many threads, to take advantage of the cores available.

      1. JeffyPoooh Silver badge

        Re: Gimme speed

        Vegan^2 noted, " of my jobs would typically take a month or so..."

        Unless it's already been done, then there's very likely an order of magnitude (or maybe three) in optimizing the code.

        Hand-tuned assembler written by somebody that really understands exactly what they're doing can be stupidly fast. Even if it's just a few LoC in the innermost loop.

        1. Richard 12 Silver badge

          Re: Gimme speed


          Honestly, no. Modern compilers are really, really good.

          Hand-tuned assembler is an outdated concept. It's incredibly expensive - months or years - and risks the results being wrong. It is better to spend the time doing runs and making the simulation more accurate.

          Making it run faster is more generally done by two methods: Make it more efficiently parallel, and make it do less - figure out which parts of the simulation aren't actually necessary, and omit them.

          Eg making sure it does things in the best order (not waiting on memory or other tasks), finding early-exit cases etc.

          Domain knowledge is the best way to optimise. Only compiler writers should be looking at assembler.

          1. Joe Werner

            Re: Gimme speed

            Plus you have to factor in the amount of time it costs for the rewrite. Honestly: Unless you run it several times you don't bother. If it can be solved "quickly enough" it is good enough, and if it can be sped up by parallelised code that's an order of magnitude you have. And that is "for free" (depends on the OS, really easy under any *nix-like system, i.e. Linux or MacOS) if you just have to link against openBLAS instead of your bog-standard BLAS in case you are doing a lot of linear algebra stuff. Or if you have stuff that is embrassingly parallel (i.e. just several processes that do not need to communicate).

            One other thing: make sure that stuff is using the correct layout in arrays, there is some speed-up just from changing between column-major and row-major ordering (which is language dependent).

            Don't bother about the rest. That's what the compiler should do and optimised libraries take care of.

          2. Ken Hagan Gold badge

            Re: Gimme speed

            ^ Richard 12

            I'd argue that assembler is the out-dated concept, not just the hand-coded variety. Modern processors don't really execute instructions anymore, they simply move data from place to place and sometimes the data is changed en route. The limiting factor on speed is nearly always how long it takes to shuffle the required data through the required list of places it has to visit, and making sure that when two pieces of data have to meet up in a place, that they do so at the same time.

            Assembly language isn't a particularly good way of expressing the requirements, but out-of-order processing allows the hardware to converts a sequence of instructions into a data-flow, on-the-fly. So we end up with people writing lists of instructions in a high-level language, which the compiler tries to turn into a data-flow but then has to write out as another sequence of (assembly) instructions, which the CPU tries to turn back into a data-flow for the most efficient execution.

            Maybe one day we'll figure out how to express non-embarrassingly-parallel algorithms directly. I'm not holding my breath, though. The academics have been looking for such languages for all of my life and no-one has found one. (I think the commercial incentives are now such that any successful solution to this problem would go mainstream within 2-3 years.)

            1. big_D Silver badge

              Re: Gimme speed

              On the other hand, GRC's Spectre checker (Inspectre) was written (quickly) in assembler and weighs in at 110KB, 96KB of which is the Windows requirement for a high definition icon.

              Some things can be written better in assembler, but very complex tasks are much easier to debug in high level languages. You can still optimize that code as well.

              BUT some jobs just take time. We don't know how well optimized that 1 month job is, it could be that they have got the runtime down to "just" 1 month. Speculating on optimizing it, just because it doesn't finish in milliseconds, without understanding what it is doing is tilting at windmills.

              I've worked on projects before, where analyses or reports took days or weeks to run. They had been well optimized over the years and the runtimes had come down with successive hardware generations. At some point, you can't optimize the algorithm any further and more processing power (whether raw speed or parallel processing) is the only way forward.

              1. Anonymous Coward
                Anonymous Coward

                Re: Gimme speed

                "I've worked on projects before, where analyses or reports took days or weeks to run."

                A new large data analysis worked - but took 11 hours to run. It already used doped vector arrays to allow easy insertion/sorting. Next day the time was brought down to 30 minutes by using a binary chop search to find things in the arrays.

                While the program was written in C - both doped vector arrays and binary chop searches were techniques I learned when writing in assembler many years ago. They both require an appreciation of low level data structures and memory allocation.

                1. big_D Silver badge

                  Re: Gimme speed

                  @AC I also worked on an early eretail site. It collapsed whenever the eBay newsletter went out. The mySQL database would cease up and the query to get the menu for the homepage would take over a minute to run. They had 4 front end servers and a mySQL database.

                  I looked at the code and quickly worked out that it was written for humans to understand, not so that a computer could execute it quickly. Changing the execution of IF statements from negative to positive and changing the WHERE clauses to work optimally on the data (different indexes and starting at the highest common denominator, instead of the lowest, which was more understandable for a human, the query time dropped from over 1 minute under load to around 12ms.

                  The next time the newsletter came out, the servers were doing well, going from 50 users per server and collapsing to 250 users per server and still plenty of headroom.

                  That was something that could be optimized and showed significant results. But, as I said above, without knowing how much the original 1 month problem has been optimized, it is pointless to speculate about further optimization. If it originally took 3 months and they are now down to 1 month after optimization, you are at the limits of what the hardware can achieve. Maybe more cores or faster storage and memory are needed?

          3. tfb Silver badge

            Re: Gimme speed

            I believe many compilers also risk the result being wrong, at least in some cases. I read a recent thing on an optimization commonly done by C compilers which turns code with behaviour defined by the spec into code with behaviour not defined by it. Certainly we worry a lot about checking that code gives bit-reproducable answers at high optimization settings, which it does not always do, even in Fortran which has been much more carefully designed for optimization than C.

            1. Anonymous Coward
              Anonymous Coward

              Re: Gimme speed

              "I read a recent thing on an optimization commonly done by C compilers which turns code with behaviour defined by the spec into code with behaviour not defined by it."

              Had an application failing with a new version of the compiler. Turned out that the optimisation now recognised that several functions had the same calling parameters and the same code.

              It then generated one instance of the code - no problem. It also conflated all the different function entry points too - so they all had the same memory address. This caused the problem - the program differentiated the different calls elsewhere by their entry point addresses.

              1. tfb Silver badge

                Re: Gimme speed

                That's a nice example! If this was C, I wonder if the spec even says whether two functions which have different textual definitions but which are clones of each other must have different addresses?

                1. gnasher729 Silver badge

                  Re: Gimme speed

                  The compiler may put both functions at the same address, but &f1 == &f2 must be false. The compiler could for example put a few nops in front of the function, to have a few different addresses available, but always call the function without any nops.

                  1. tfb Silver badge

                    Re: Gimme speed


                    1. Lee D Silver badge

                      Re: Gimme speed

                      Few things are CPU limited that can't work better with a bit of rejigging and some parallel processing (eg. GPU processing).

                      But things plateaued really quickly because they hit physical boundaries.

                      Nothing stopping people making a core without a consistent clock across it. It's perfectly viable, theoretically. But it would mean architecture changes, most likely. Or it's performance for synchronous tasks would just fall back to "waiting for everything" and you would see no speed gain.

                      Heat and chip size are limiting... you need a very tiny, very hot chip, which is really bad for materials that you want to cool, where you just want everything to be spread out and cool. It's like putting a soldering iron bit on your motherboard, basically. Just because it's small doesn't mean you can stop it destroying itself / it's surroundings by blowing a fan near it.

                      I think we'd see much bigger gains, anyway, from things like memory that's closer to the chip without relying on tiny local caches to keep the CPU fed (isn't that the problem with things like Rowhammer, etc. too?). If we could bring the RAM into the CPU, and things like persistent RAM, then you'll probably see greater performance increases as the 3GHz CPU will always be kept busy as opposed to a 5GHz CPU that's constantly waiting on the RAM for data.

                      To be honest, I'm at the point where - despite as a kid looking at a 4.77MHz chip and being unable to imagine the speed of 1GHz, and then achieving it in only a few years - I look at the top-of-the-line chip frequencies and don't see them changing anywhere near as much in the next decade or so.

                      With virtualisation, parallelisation, etc. however it won't matter much for almost any "ordinary" workload. And HPC is moving towards GPGPU, custom chips etc. anyway. We'll see a quantum computer before we'll see a 10GHz home machine.

                      I think I'd rather my servers had 100 cores idle at 3GHz than anything else anyway. VM running slow? Add another half-dozen cores and some more RAM into it. Pretty much normal stuff (SQL, etc.) will scale just fine.

                      The problem there is the licensing is going to become insane unless revised (but I run Windows Server Datacenter anyway, so I don't particularly care for most things!).

                      It will lead to the point, though, where one server could in theory allocate 10 cores per client (to things like terminal services, etc.) and be just as fast as anything you could do locally, and at that point you might see a push towards thin-stuff again. Until the next fad-cycle, of course.

                      1. Martin an gof Silver badge

                        Re: Gimme speed

                        Nothing stopping people making a core without a consistent clock across it. It's perfectly viable, theoretically. But it would mean architecture changes, most likely.

                        What, do you mean like the AMULET?


                  2. Ken Hagan Gold badge

                    Re: &f1 == &f2

                    Are we using the correct tense here?

                    I think I first encountered a discussion of this point about 20 years ago, in the context of C++ templates producing *many* byte-level-identical functions and then being pretty much obliged (for sanity's sake) to eliminate all but one as a linker optimisation. Once identified, the problem was easily fixed because the compiler can see whether function addresses are ever used as a proxy for identity. I can't say I've heard anyone mention it in the intervening decade or two.

              2. Gene Cash Silver badge

                Re: Gimme speed

                > the program differentiated the different calls elsewhere by their entry point addresses.

                Wait, what? Why the hell would it do that?

              3. Richard 12 Silver badge

                Re: Gimme speed

                @AC with the funky function calls...

                If that was C or C++, you were relying on Undefined Behaviour.

                Compilers are free to demonize your nasal passages if you do that.

            2. big_D Silver badge

              Re: Gimme speed

              @tfb optimization has always been a problem. We had a demo mainframe delivered and the sales guy gave us a tape with source code for our VAX cluster. He told us how wonderful his mainframe was, and how fast. We should compile the code with all optimization on the VAX and let it run and run the same code on his mainframe. We should call him back in a week, once the mainframe was finished, the VAX would need a month!

              There was a note waiting for him by the time he had returned to the office (those were the days before mobile phones). The VAX was finished.

              It had taken the source code, analysed it and came to the conclusions: No input, a lot of calculations, no output = nothing to do. The program created a huge multi-dimensional array, filled it with random numbers, performed some calculations on the random numbers and dumped the array, when it was finished. The mainfram dutifully compiled it and executed it, the VAX made a small .exe that finished in a fraction of a second.

            3. Anonymous Coward
              Anonymous Coward


              A (standards) conforming compiler is not permitted to change the behaviour of a strictly-conforming program (one that follows the rules).

              The problem is that a lot of C code contains instances of undefined behaviour. These are often benign at low (or no) optimization levels, but do cause some real surprises at higher optimization levels. The programmer is basically required to honour a contract specified within the standard, and the optimizer assumes that is the case.

              The important thing here is it is the code that is broken, not the compiler. Unfortunately, it is fairly easy to introduce undefined behaviour into the code accidentally - and there is no requirement for the compiler to issue a diagnostic.

          4. Anonymous Coward
            Anonymous Coward

            Re: Gimme speed

            Massively parallel computation is the best way forwards. The Condor CPU-sharing system is a very good exemplar of this; a researcher at a certain university which uses Condor said that a simulation run that on his own research cluster would have taken six months to run completed on the early experimental Condor system over one normal weekend. The University has, of course, expanded their Condor implementation hugely since those early days.

          5. Claptrap314 Bronze badge

            Re: Gimme speed

            "Only compiler writers"--well, them and low-level drivers. And folks doing validation of the processors. :P

            1. onefang Silver badge

              Re: Gimme speed

              '"Only compiler writers"--well, them and low-level drivers. And folks doing validation of the processors.'

              And people working with tiny microcontrollers.

          6. JeffyPoooh Silver badge

            Re: Gimme speed

            Richard12 offered, "...Hand-tuned assembler is....incredibly expensive - months or years..."

            You've failed to read to the end of my post, where I specifically suggested, "Even if it's just a few LoC in the innermost loop."

            Modern compilers can be good, but if a run takes a full month then it seems clear that he's almost certainly running his program as a crappy and inefficient high level script.

            Room for improvement is inevitable.

        2. Anonymous Coward
          Anonymous Coward

          Re: Gimme speed

          Admittedly, there's never any harm in pulling out some profiling tools to see what can be done, but I doubt that the potential speed benefits from unrolling a loop or two would ever outweigh the costs of implementing, testing and maintaining the changes, especially when the odds are good that the heavy lifting is being done by an industry standard library - and it's quite probably running on a souped up GPU via Cuda/OpenCL or similar.

          Writing good code is hard. Writing good assembler is much harder. Writing good parallisable assembler is several orders of magnitude harder still! And proving that what you're written is functionally correct is an absolute nightmare.

          If you think you can do a better job than the geniuses who write the compilers, or the academics who wrote the libraries, then have at it! But, y'know, you're probably not - and the time you spend hacking away at something which is Known Good is probably best spent on something else.

          I've been reminded of this recently, when trying to address some performance issues in a large lump of legacy code. It's the kind of code which makes your eyes bleed, having organically "evolved" over the last ten years with the aid of a large number of developers with highly varying abilities. And in the last few years, people have just tended to hack in changes wherever it was easiest.

          The result is fundamentally unmaintainable: it's virtually impossible to get a clear view of the overall business logic, or of all the special cases which are being handled, especially as many are implicit or conflated with other special cases. As a result, even small changes can act like a crazed chaos butterfly, causing issues in places you'd swear couldn't be impacted.

          Thankfully, we now have buy in from the business to bite the bullet and make a start on cleaning things up. But realistically, it may take weeks or even months before we can be confident that the new code is functionally equivalent!

          (blah blah unit tests. blah blah specifications. blah blah test plans. As ever, the Holy Unbalanced Tripod rule comes into play: you can have it delivered quickly, you can have it well tested, and you can have it at low cost. But at best, you can only ever get two of the three...)

      2. Korev Silver badge

        Re: Gimme speed

        That’s the case for most users, but I have genetic analyses that are highly parallel, so more cores scales better than more clock speed.

        I'm not familiar with your situation, but why don't you run on a cluster? If you're doing something like sequence alignment to a genome then it's pretty easy to scale the number of jobs up to the number of files or file pairs (for PE reads).

    2. Ken Hagan Gold badge

      Re: Gimme speed

      "Gimme a 10GHz CPU."

      CPU frequencies have hardly moved in over ten years. The wavelength of light at 10GHz is smaller than the die size of aforesaid CPUs. It is quite plausible that you will not live long enough to see a 10GHz part in normal commercial channels. (And no, I have no idea how old you are.)

    3. Anonymous Coward
      Anonymous Coward

      Re: Gimme speed

      "Most tasks are hard to parallelise"

      Most problems are inherrently parallel, but writing parallel code "is hard" ...

    4. HPCJohn

      Re: Gimme speed

      Not possible. Heat dissipation goes up with the square of the frequency.

      A 10Ghz chip would be as hot as the surface of the sun, or something like that.

      1. DropBear Silver badge

        Re: Gimme speed

        Unacceptable! If nine women can deliver in one month the same baby that one woman can in nine months, there's no reason we shouldn't expect CPUs to get with the program too and start getting much faster again!

      2. TechnicalBen Silver badge

        Re: 10Ghz as hot as the sun?

        Kit hit 7Ghz this week... granted on Liquid Nitrogen, but still, 7Ghz! We were hitting 3 or 4 on L in the past, now it's run on air or water cooling easily.

        10Ghz is possible, may take a Looooong time though.

        1. Tom 7 Silver badge

          Re: 10Ghz as hot as the sun?

          Last time I checked out a rig like that the rig cost far more than buying another computer and provided less performance increase. Pretty coloured tubes though.

      3. Claptrap314 Bronze badge

        Re: Gimme speed

        For a fixed technology. A long time ago, we were facing this problem with bipolar processes (ie: transistors.) Folks were looking for alternatives to MOSFET more than a decade ago because they saw this coming. Depressing that nothing has been found so far.

      4. Anonymous Coward
        Anonymous Coward

        Heat dissipation goes up with the square of the frequency

        It does - for the same device geometry.

        Reduce the size (and hence things like capacitance) and the power requirement goes down about the same as the reduction in the transistor surface area.

    5. Michael 47

      Re: Gimme speed

      Sadly, given the way processors work at the moment, it is physically impossible to go above about 5GHz, because the pulse literally can't propagate fast enough. If we assume the clock pulse can propagate at the speed of light, at 5GHz the pulse can propagate about:

      (1/5e9)*3e8 = 0.06

      or about 6cm, which is roughly the size of the CPU, so if you clock it much faster than that the next tick will happen before the previous one has even reached some of the components in the CPU. I wouldn't say never, because they continue to amaze me with the innovations they make, but in their current configuration I don't believe we will ever see a CPU that can be clocked much above 5GHz

      1. UncleNick

        Re: Gimme speed

        "but in their current configuration I don't believe we will ever see a CPU that can be clocked much above 5GHz"

        You might want to Google up the overclocking world records...

      2. Claptrap314 Bronze badge

        Re: Gimme speed

        You might want to look into clock distribution methodologies before making that claim. You might even want to dig into the details of pipelining. I have no doubt that any of the majors could ship a chip with 20GHz clock speed in a quarter or so. These "new" chips would have pipe lines that were four times as long as current ones, however. Also, their performance would almost certainly degrade a bit.

        Clock speed has been effectively meaningless since AMD introduced the K5. It's performance that matters, everything else is marketing. Unfortunately, performance is a per-job thing. All of which is WAY too complicated for consumer marketing. So, people talk about clock speed as if it matters.

        What is a clock cycle inside a microprocessor? It's the frequency of the latch sampling at the end of a chain of unlatched gates. If you want to increase cycle speed, you can simply reduce the number of gates between the latches. (To a point.)

    6. HPCJohn

      Re: Gimme speed

      Probably not that relevant to this discussion, but regarding performance and compilers you should look at the Julia language for scientific and technical computing. Looks like Python, runs like C. It is as fast as C in many instances. It uses multiple dispatch

      And as this is a UK based website, worth flagging up that the Julia conference this year comes to London in August.

    7. Daniel von Asmuth Bronze badge

      Re: Gimme speed

      A 10 GHz CPU? Try to revive the old Alpha AXP architecture. Forget about complex instructions and speculative executing or hyperthreading; say hello to even longer pipelines. Use photonic chip interconnections. Focus on low latency and smile when you see fewer TFLOPS in benchmarks, and expect huge cooling requirements.

      Amdahl's Law will tell you that most programs will receive less than a 100-fold speed-up if you run them on 100 cores (instead of 1), but but tasks can be parallellised to some degree.

    8. rav

      Re: Gimme speed

      More cores allow you to browse, play music, use Xcel,

      You are right most folks just may not need 32 threads. So what!!!! That does not keep them from wanting 32 cores!!!

      Besides Intel has for years been touting their leadership with multicore performance. Now they are on the other end of that stick and some folks just do not like that!!!

  3. Rashkae

    Intel was fudging

    They took an existing Xeon part and needed a -10C chiller and a 1000W power supply just to overclock it to 5Ghz.

    AMD showed production sample silicon that was air cooled.

    1. Piro

      Re: Intel was fudging

      I believe it was a 1600W PSU.

      The whole Intel demo was an utter joke. Cooler + PC consuming north of 2kW in a very desperate move.

      This doesn't lead to a consumer product.

      1. DougS Silver badge

        Re: Intel was fudging

        Intel was fudging, but the press ate it up. That's all they cared about, because stock analysts will have seen Intel's demo and think "Intel is still comfortably ahead of AMD" instead of "AMD is hot on Intel's heels, we should lower our price targets on Intel!"


POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019