back to article Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7

Oracle has revealed details of its next-generation SPARC CPU, the M7. As John Fowler, Oracle's executive veep of systems predicted when chatting to The Reg last month, the company took the wraps off the M7 at last week's Hot Chips CPU-fest and filled it with goodies to make Oracle software go faster. Under the hood of the CPU …

Page:

  1. Anonymous Coward
    Anonymous Coward

    "32 of them in harness might just be dangerously close to Skynet."

    Not a chance. Just introduce it to Johnny Drop Tables.

    1. Tokoloshe
      Joke

      Though

      that's assuming there's an entity on earth wealthly enough to licence 1,024 cores of Oracle Enterprise software with all the optional trimmings in the first place.

      Dr Evil Ellison gets it for free obviously...

    2. Anonymous Coward
      Anonymous Coward

      Presumably they come connected in a ring so that you can fit the boat anchor chain through it.

      I can't believe that people are still buying this stuff. Solaris was disinvest a decade ago in most places.

      1. DougMac

        Oracle just made sure to sift out anybody that wasn't willing to pay them billions and billions of $$.

        Those that are still *heavily* invested in Solaris are still going strong on SPARC/Solaris, they just weeded out the small to mid-sized shops that weren't the kind to give Oracle billions of $$.

  2. SplitBrain

    Nice!

    The most dense CPU ever created, Oracle is doing things with SPARC that Sun alone would not have been capable of, good to see from this former Sun Shiner, never thought I would say that!

    1. Anonymous Coward
      Anonymous Coward

      Re: Nice!

      "The most dense CPU ever created"

      It is indeed nice to see, but it doesn't exist yet. Oracle says that it might exist in 2015.

      If they can actually deliver it then yes, it will be quite a chip.

      However, the time scale they're talking about still gives Intel and IBM plenty of time to roll out their own road maps, and one wonders how and when the M7 will compare to those.

      1. Anonymous Coward
        Anonymous Coward

        Re: Nice!

        Not really. POWER 8 is just hitting the market now and IBM launches major versions in a three year cycle, so P8 is all you'll get for nearly three years now. As for Intel, they have hardly any focus on anything above two sockets these days - they're too busy fighting off the challenge from low power ARM.

        1. Mad Mike

          Re: Nice!

          Who knows what the wafer size is, but cooling something like that is going to be a challenge. Even with die shrinks and lower voltages, it's going to consume a lot of power and all that heat has to be drawn away somewhere.

          1. Mad Mike

            Re: Nice!

            "Who knows what the wafer size is, but cooling something like that is going to be a challenge. Even with die shrinks and lower voltages, it's going to consume a lot of power and all that heat has to be drawn away somewhere."

            I really wonder sometimes. How does a comment about trying to cool something like this get a thumbs down? Power and cooling is one of the biggest issues processor designers have to face!!

            1. Paul_Murphy

              Re: Nice!

              No idea, but maybe these servers will have liquid cooling as standard or something.

              Which is about time to happen IMHO.

              1. Mad Mike

                Re: Nice!

                Ah, liquid cooling!! Back to the old days.........

                Not sure how much liquid cooling can help though. As the die gets bigger (and even at this density, it's going to be pretty big), it becomes very difficult to get the heat from the center of the chip. I've often wondered whether they'll start producing chips that have cooling channels through them rather than just around (or on top) of them. That would help a lot, but is fraught with difficulties. Might even allow them to cool (as in chill) the chip as well, with a suitable refrigerant.

            2. Roo
              Windows

              Re: Nice!

              "I really wonder sometimes. How does a comment about trying to cool something like this get a thumbs down?"

              That's easy: The down-voters are ignorant fanbois and shills. They really don't give a toss about the tech, all they care about is burying bad news under a mountain of downvotes. The Itanic fanbois did the same trick, a few architectures got buried as a result, but in the real world the Itanic still ended up as an overpriced, inefficient and underperforming boat anchor. The only winners were the shills who got rich in the process (eg Steve Milunovich), of course none of them actually had to use an Itanic to earn a living...

          2. samlebon2306

            Re: Nice!

            "but cooling something like that is going to be a challenge"

            Maybe they need to submerge it mineral oil.

        2. PowerMan@thinksis

          Re: Nice!

          Using your explanation of the Power8 roadmap, they began shipping Power8 in June '14. If they start shipping Power8+ (reasonable to expect IBM to stick with the entry level roll-out as they did with Power8) in 18 months that would put them the Nov / Dec '15 timeframe. Oracle having a 2015 rollout could be January or December. I think the point is valid that they are behind and will be further behind. With regard to Intel - not sure what planet you are on but I see Intel heavily focused on 4 socket servers. Ivy Bridge EP (E5) v2 for the 2 sockets and EX (E7) v2 for 4 sockets and above. I am not fully briefed on Haswell and Broadwell but it's reasonable to expect one of them will deliver a 4 socket solution. It's possible Intel is figuring out enterprise customers don't like rapid change in chipsets that run heavy duty workloads and would rather have a reliable chipset over the latest and greatest every chip release. What you cite is what Intel is battling in general - trying to go after the enterprise space while continuing to own the 2 socket space and defend against ARM in both the 2 socket but also the mobile & portable space. Google using Power, Apple considering a change plus their continued growth with the iPhone / iPad puts pressure on every chip manufacturer.

        3. kkreu

          Re: Nice!

          The Sparc E7 seems nice but if you start to break it down it still falls short to IBM's Power 8 processors. They may have more cores per chip (which can effect software cost) but they have less L3 cache per core. The IBM S824 24-Core server has and L3 bandwidth of 5,407GB/s (~5.28TB/s) and an L2 bandwidth of 4,055GB/s (~3,95TB/s). The S824 is ~230% faster for L3 and ~691% faster for L2.

          They do not talk about anything about their I/O or memory performance as well while the S824 has a total memory bandwidth of 384GB/s and the I/O bandwidth of 192GB/s.

    2. Jim 59

      Re: Nice!

      Awesome. 10 billion transistors in the headline but the story doesn't repeat that claim - is it true Reg ?

      1. Richard Boyce
        Thumb Up

        Re: Nice!

        That you can have so many transistors on a chip and reliably get chips that work is mind-boggling. The purity and quality control must be stupendous.

    3. PowerMan@thinksis

      Re: Nice! -- NOT!

      The most dense CPU ever? An easy search on google will show you this is not the most dense CPU ever created - Intel, Nvidia, Azul and more have far higher core density. Oracle isn't doing things Sun was not capable of....moreover, I would argue they are pulling old plays out of the Sun playbook in a desperate attempt to remain relevant. One example is the similarity the M7 has to the Rock processor cancelled in 2009/2010. Rock was a 16 core CPU design made up of 4 clusters of 4 cores each. Very similar to the M7 which has 8 clusters of 4 cores each - coincidence? Hmmm!

      1. Mad Mike

        Re: Nice! -- NOT!

        "One example is the similarity the M7 has to the Rock processor cancelled in 2009/2010. Rock was a 16 core CPU design made up of 4 clusters of 4 cores each. Very similar to the M7 which has 8 clusters of 4 cores each - coincidence? Hmmm!"

        There's not necessarily any issue with using ideas from the past, brought up to speed with the latest technology. However, the design of this chip demonstrates one of the biggest problems for designers these days......interconnects. Any to any interconnects are always going to be best, but as the number of endpoints rises, become impractical. So, interconnect technology is likely to become one of the biggest drivers of processor/core speed. Used to be seen as mostly a problem in big (such as Power, Integrity etc.) servers with many processors, but as core numbers increase, is even becoming a problem between cores.

        1. Magellan

          Re: Nice! -- NOT!

          Rock was much more than sixteen cores in four core clusters. Originally Rock did not call the cores in the core clusters cores, it referred to the core cluster as a core. The core cluster in Rock was four integer pipelines and one shared floating point pipeline. The four integer pipelines shared an instruction fetch unit and L1 caches. There was a dislike of calling the cluster multiple cores because at that time all CPU cores contained an instruction fetch unit, a dedicated L1 cache, and an FPU. It was only later when marketing decided a high core count suggested advanced engineering that the individual integer pipelines were called cores. This was consistent with the marketing of the various UltraSPARC T processors, which did not have a one-to-one ratio of IUs to FPUs. Rock's advanced features included hidden hardware helper threads to prefetch data (the "Hardware Scout"), the ability to simultaneously run both branches of a code branch ("Execute Ahead"), "reverse hyperthreading" ("Scalable Simultaneous Threading", which turned the four integer pipelines and their paired floating point pipeline into a single virtual core for HPC workloads), and transactional memory.

          Rock's four core clusters shared an L2 cache. There was no on-chip L3 cache.

          Rock had an in-order pipeline, but could execute out of order via Execute Ahead. If I recall, each Rock integer pipeline had four hardware threads, two for executing code (allowing Execute Ahead), and two for the Hardware Scout (to feed the two execution threads). Only two threads were visible to the operating system. Rock was interesting because it used threading to gain ILP.

          It appears of the advanced Rock features, M7 only has transactional memory, although Solaris has used a software based version of scout threading (called Dynamic Helper Threading) since the UltraSPARC IV+ days and this was expanded in the various UltraSPARC T series.

  3. Mad Mike

    Cache size

    Does anybody else think that 64MB of cache seems tiny for 32 cores and 8 threads a core?

    1. Roo
      Windows

      Re: Cache size

      "Does anybody else think that 64MB of cache seems tiny for 32 cores and 8 threads a core?"

      Totally inadequate at that kind of clock rate, they are banking (sic) on the latency being hidden by threading. It'll be interesting to see how one of those chips stacks up against a Xeon Phi.

      1. Mad Mike

        Re: Cache size

        It's an interesting move. Given the way they released T processors before and then reduced core count to produce a M processor. Are they not going to produce M and T versions of this one? If they are, the T version should have something like double the cores!!

        As to the latency being hidden by threading......one of the primary purposes of the caching is to prevent cache thrashing when running lots of threads, so the threading should make it even worse!! The early T chips showed that admirably.

        1. Anonymous Coward
          Anonymous Coward

          Re: Cache size

          Are they not going to produce M and T versions of this one?

          Seemingly not:

          http://www.enterprisetech.com/2014/08/13/oracle-cranks-cores-32-sparc-m7-chip/

          "Oracle will be discontinuing the T Series chips, Fowler tells EnterpriseTech, and building future Sparc machines on the M7 processors solely."

          1. Anonymous Coward
            Anonymous Coward

            Re: Cache size

            The M and T processors are already partly merged with the T5 and M10 servers which now share the T-series processor capabilities. It makes sense for Oracle to merge the designs completely and differentiate with T and M-series servers which focus on different capabilities.

            1. Mad Mike

              Re: Cache size

              If, as said here, they're merging the T and M chips, I wonder if they'll offer sub-capacity offerings with less than 32 cores? Maybe reuse some of the chips with failed cores? Starting a server range at a single processor with 32 cores (and presumably an appropriate cost) is not really that viable and could loose a lot of good business. Unless, of course, they're only interested in people who want single boxes that size and bigger? Maybe push smaller users onto x86. One of the 'benefits' of the smaller T-series servers was that a small one could be purchased quite cheaply. I assume a single processor M7 server won't be that cheap?

            2. fch

              M/T processor vs. systems ... [ was: Re: Cache size ]

              There's T-series/M-series CPUs - which are all Oracle SPARC.

              Then there's T-series systems - which are all Oracle, using Oracle T4/T5 CPUs in the systems of the same name.

              And there's systems colloquially termned "M-Series".

              Of which only the M5/M6 (and M7 to come, unless Oracle chooses to rename the system before launch) are Oracle, and use Oracle SPARC CPUs of the same name.

              The older Mx000 and current M10 systems, though, are designed by Fujitsu's, and use Fujitsu's SPARC64-series CPUs (in the M10 series, the SPARC64-IX - the "commercial spawn" of the current K Super). On HotChips, Fujitsu also presented on the SPARC64-XI - to go into the post-K-Super, and possibly later into (an update of) the M10 series of systems.

              Noone quote me on all these names and numbers please - refer to the vendors' marketeeting departments for the canonical incomprehensible advice instead, and to their legal departments for even more incomprehensible guidance on trademark usage.

            3. Casper

              Re: Cache size

              There is no relation between the Fujitsu M10 systems (which have a SPARC64 CPU) and the

              Sun/Oracle Mx/Tx chips. What changed with the M10 is that it now also uses the sun4v architecture; i.e, a SPARC system with a Hypervisor. This makes the systems look more similar from an admin perspective.

  4. Jim 59

    Multi-core

    Multi-core is great for parallel tasks, obviously. I can encrypt a huge file on my 8 core laptop and the machine doesn't slow down at all, I can happily continue to do other stuff. In single core days it would have been reduced the whole machine to a crawl.

    But there is a downside. Many tasks can't be parallelized by present software. For example, that encryption above. It only gets one core, so only gets about 12% of the PC's compute power. In an ideal world, it would take 6 or 7 cores, run mongo-fast, and leave me with 1 or 2 CPUs read El Reg and play Tetris.

    1. Chemist

      Re: Multi-core

      "Multi-core is great for parallel tasks"

      Bot when it works oh, yes. The quad-core i7 laptop I'm writing this on is bloody great with programs like ffmpeg which will transcode video with all 4(8) cores running at ~85-90% and still is responsive for less demanding jobs. Gets rather hot though !

    2. Alan Brown Silver badge

      Re: Multi-core

      "Many tasks can't be parallelized by present software. For example, that encryption above. It only gets one core, so only gets about 12% of the PC's compute power."

      Allow me to introduce you to my friends

      pigz - http://zlib.net/pigz/

      pbzip2 - http://compression.ca/pbzip2/

      There are other multithreaded archivers but these are the most useful in a *nix house.

      most 7zip and xz code has multithread support built in

      1. fch

        Re: Multi-core

        Both gzip and bzip2 parallelize well only for compression - because that's a "blocked" operation, i.e. a fixed-size input block is transformed into a hopefully-(much-)smaller output chunk. The latter are then concatenated into the output stream. Since the output is stream, there's no "seek index table" at the beginning though and hence one cannot parallelize reverse in the same way. You only know where the next block starts once you've done the decompression and know how far the "current" one extends. While one can "offload" some side-tasks, the main decompression job is singlethreaded with both the abovementioned implementations.

        One can, though, obviously compress/decompress multiple streams (files) at the same time. That's what ZFS uses, for example - every data block is compressed separately, and hence compression/decompression on ZFS nicely scales with the number of CPU cores.

        [ morale: better use a compressing filesystem than compress files in a 1980's filesystem ? ]

        1. Michael Wojcik Silver badge

          Re: Multi-core

          Both gzip and bzip2 parallelize well only for compression - because that's a "blocked" operation

          True, but encryption1 can also be done in parallel blocks, for example using a block cipher and GCM (Galois-Counter Mode) combining.

          One can, though, obviously compress/decompress multiple streams (files) at the same time

          Yes, and clearly that's the solution for large archives: build them from multiple compression streams. With many corpora, you can get close to the same overall compression ratio even if you partition the input in various ways, for example interleaving (one stream for each Nth block, for some block size, then interleave the outputs the same way when decompressing). There are other possibilities that can improve compression ratios for typical jobs, even higher than what a good compressor (e.g. PPMd) would if simply run over the entire corpus as a single byte stream.

          1I know you went on to talk about decompression, but the OP mentioned encryption in this context.

      2. Charlie Clark Silver badge

        Re: Multi-core

        Allow me to introduce you to my friends

        Although the programs sound nice I'm not sure they were a suitable answer to the original post which was about encryption not compression.

    3. eldakka

      Re: Multi-core

      You are aware that this is a server-oriented processor are you not?

      That You're not likely to care about the performance of a gzip/bzip2/zip whatever?

      What you ARE likely to be doing is running 20-30 JVMs of multi-gigabyte heap sizes each handling 100's if not 1000's of user tasks simultaneously.

      Or running a whacking-great Database on it (it is from ORACLE now) doing 1000's of simultaneous, independent database queries (selects, inserts, etc).

      You don't need to be able to extract instruction-level (or task level) parallelism from a SINGLE process (e.g. transcoding video, compressing) or from single tasks when you are running dozens, hundreds, THOUSANDS of SEPARATE independent processes/tasks simultaneously. As tends to happen on servers, which is what this chip is aimed at.

      1. Mad Mike

        Re: Multi-core

        @eldakka.

        Very true to an extent, but there are plenty of systems which are heavily single (or low numbers of) threaded in existence today. Parallelism is coming more and more, but isn't fully there yet. Also, just because something is parallel doesn't mean it doesn't care about latency and other issues that parallelism can cause. Also, don't forget that some things are naturally parallel, such as OLTP systems. However, other workload is naturally not parallel and trying to turn it parallel causes (in some cases) a very significant overhead. It's getting better all the time, but parallelism isn't the answer to everything and causes it's own problems as well.

  5. John Smith 19 Gold badge
    Happy

    A SPARC thread without Matt Bryant....

    It's just, you know, unexpected.

    Takes a bit of getting used to.

    1. Anonymous Coward
      Anonymous Coward

      Re: A SPARC thread without Matt Bryant....

      what happened to him ? Any RIFs at HP/IBM lately ?

      1. PowerMan@thinksis

        Re: A SPARC thread without Matt Bryant....

        LOL - he never worked at IBM as afaik - just checked the public IBM directory and don't see him. Plus, if he did he was a bitter employee :) He always claimed to work for a non-vendor customer location.

    2. SplitBrain

      Re: A SPARC thread without Matt Bryant....

      Don't see much of Bryant these days on Unix related matters, the reg (and the world tech press) has sweet F'all to report on HP-UX and itanium these days.

      He appears to expouse his (mostly) right wing views on things frequently on other threads though, so it's not as if has disappeared.....

  6. Captain Server Pants

    This explains IBM's $3 billion systems invest FUD

    Also, IBM's desperation to unload chip fabs. TSMC is far ahead of IBM. Supposedly Power8 yields are low and they're using 2 chip modules to get to 12 cores per slot. Performance not good because of loss of single chip cache coherence so they went to giant off chip(s) shared L4 cache. Sparc M7 seems like a big step ahead.

    1. PowerMan@thinksis

      Re: This explains IBM's $3 billion systems invest FUD

      Um, not true. Your comment is FUD. Each socket in the current Power8 Scale-out server is package with up to 2 x 6 core chip modules. IBM has done this on Power5, Power6 and now with Power8 servers so nothing new. Moreover, whats wrong with it if it performs? Your comment about "Performance not good" is off base. What do you base it on? I would point you to the following benchmarks where you see a 24 core S824 match a 4 socket 60 core Ivy Bridge EX v2 server in SAPS & Users. Outperform in SPECint, SPECfp, SPECjbb and more. Those are just benchmarks, my customers are seeing the performance and more.

      "So they went to giant off chip(s) shared L4 cache." Really? Adding technology and innovating is now a gimmick? By this explanation having L3....L2 and even L1 cache are all gimmicks. Just main memory and cpu for you. Come on, are you a bit jaded by your SPARC love? It's ok to say your Ford SPARC is the best ever but don't lie about my Chevy Power :) Power8's L1 (D+I) are 2X greater than x86 and 4X+2X over SPARC T5. L2 cache is 2X over x86 and 4X over SPARC T5. L3 is 2.5X over x86 and 12X over SPARC T5. Neither x86 or SPARC have L4 while Power8 has 128 MB per socket. This gives Power an advantage to get data closer to the core so it may fit entirely in a lower cache line.

      1. Mad Mike

        Re: This explains IBM's $3 billion systems invest FUD

        As has always been said; it's the whole path you need to consider. Getting cores faster is no good unless you can keep the data coming in faster as well. Faster memory, faster I/O, faster interconnects etc.etc. There's plenty of innovation going on all over the place.Cache sizes are way bigger on Power chips at the moment and they've opened up with architecture as well. Inviting other companies to create accelerators and the like that sit directly on processor interconnects etc.

        Oracle are heading down the 'accelerator in silicon' route much faster than others.Not that others haven't done it, but it seems to be a much higher priority drive at Oracle. You can see the attraction. Their hardware is perfectly tuned to their software and gets advantages other hardware can't give. At the same time, they refuse to code their software to use accelerators etc. present in other brands of hardware. All about locking and from Oracles perspective is a win-win. However, it is only their version of Sparc they can do it with and their Intel/AMD deployments won't enjoy the same advantages, unless they can persuade Intel/AMD to play ball with them :-)

        As to Itanium.................it rather seems to have fallen off the coupon...

      2. Captain Server Pants

        Re: This explains IBM's $3 billion systems invest FUD

        "Each socket in the current Power8 Scale-out server is package with up to 2 x 6 core chip modules. IBM has done this on Power5, Power6 and now with Power8 servers so nothing new. Moreover, whats wrong with it if it performs? Your comment about "Performance not good" is off base. What do you base it on?"

        You've clearly drank the IBM Kool Aid. It's entirely possible Power8 is a good step up from P7+ and "your customers" are seeing nice performance improvements. My post said nothing about comparing P8 to prior generation of Power.

        Do you realize 2x6=12? When Power8 was previewed at Hot Chips 25 (last year, 2013) it was presented as a single die 12 core chip. Here's the Reg article:

        http://www.theregister.co.uk/2013/08/27/ibm_power8_server_chip/

        To date no IBM system offers a single die 12 core Power8. When you introduce something as one thing and then release it as another it's know as FUD in the IT business. In retail it's called "bait and switch". Despite your rant.

        Do you realize the 6 core chips in the 2 chip modules (remember 2x6=12) contain inactive cores? Why would this be the case? Maybe IBM fabs cannot manufacture the single part in high enough yields. Will Oracle follow IBM's lead and introduce 2x16 core (with deactivated cores) SPARC chip modules because it's a better solution? Sorry no, it's the opposite.

        1. Mad Mike

          Re: This explains IBM's $3 billion systems invest FUD

          @Captain Server Pants.

          Interestingly, it's not as clear cut as you say, depending on exactly what is working and what is not on the chip. If you only have say half the cores, but all the cache is working, that will help quite a lot. If some of the cache isn't working, that's not so good!! It just depends on what has failed. Not sure why you're picking on IBM for this either as just about everybody does it. AMD, Intel etc. have done it in the past. Indeed, in earlier incarnations of the T chips, Oracle/Sun used to sell processors with less than the normal number of cores. Now, I'm not saying if they were simply deactivated, or failed, I don't know. However, it's likely at least some were failed. At this sort of density, you're always going to get some failures and selling them at the lower end is quite a reasonable way of utilising them.

          As you have said, all the current IBM Power 8 servers use 2 chip modules, but they are all the lower end servers and this has been the case for years. It's only when you go up the server line that you get the full chips being used. This is for very good reason. Firstly, it uses the 'slightly' faulty chips and also allows manufacturing issues to be ironed out early. That's why they launch the low end first.

          Regardless of the rights and wrongs of how it's done, the proof is in the pudding and the performance per buck. If making it a 6 + 6 gives better performance per buck, then that's just fine.

          P.S.

          I strongly suspect that if Oracle attempt to manufacture the Sparc chip as mentioned, we'll see lower end systems with less than the full core count 'activated'. Attempting top manufacture chips at this density and core count (not to mention accelerators etc.) and some failures will occur. You either throw them away and absorb the cost, or do something else with them in the low end!!

          1. Captain Server Pants

            Re: This explains IBM's $3 billion systems invest FUD

            @Mad Mike

            Yes, I do admit to an extreme characterization here. Your points are all well taken. Many vendors Oracle/Intel/IBM use multi chip modules and different activation schemes for capacity on demand and other reasons. It's totally legitimate practice and they all do it.

            IBM Power and Oracle's per core pricing put the final shade on the Sunshiners. I was never one of them because SPARC wasn't a good investment for a long time. In my earliest days we used Oracle on Sun SPARC, in the early/mid '90's. After that it was HP and IBM (Power 4 and Power 5) until around 2008. Since then I'm in a Microsoft stack Dell server shop. So I'm not religious about any one thing. Oracle makes the best database though SQL Server is equal/close in many ways.

            Bottom line is there are 3 chip manufacturing trains which are competing through 10nm/7nm/5nm Intel, TSMC, and Samsung. It seems to me IBM needs to get on board one of those trains asap if it wants to keep up.

        2. Freddellmeister

          Re: This explains IBM's $3 billion systems invest FUD

          Captain Server Pants writes:

          "Do you realize the 6 core chips in the 2 chip modules (remember 2x6=12) contain inactive cores? Why would this be the case? Maybe IBM fabs cannot manufacture the single part in high enough yields. Will Oracle follow IBM's lead and introduce 2x16 core (with deactivated cores) SPARC chip modules because it's a better solution? Sorry no, it's the opposite."

          If you check the IBM POWER8 S824 Redbook you notice that all 4 IO buses are wired from each socket. So in fact what might look like a way to use broken chips, the 2x6 dual chip socket design increases the IO performance by 100% compared to a 1x12 core designs.

    2. Roo
      Windows

      Re: This explains IBM's $3 billion systems invest FUD

      "Performance not good because of loss of single chip cache coherence so they went to giant off chip(s) shared L4 cache."

      The POWER8 has 512kbytes of dedicated L2 *per core*. That is backed by a further 96Mbytes of shared L3 on the same die, and up to another 128Mbytes of L4.

      By contrast the M7 has 256kbytes of shared L2 for each 4 cores, and 64Mbytes of shared L3 per die.

      "Sparc M7 seems like a big step ahead."

      The M7 has less cache, and the L2 cache has 4x the number of cores using it. Even if you ignore the L4 cache, the M7's caching scheme is in fact a step backwards for people who value single-thread performance.

      1. Mad Mike

        Re: This explains IBM's $3 billion systems invest FUD

        @Roo.

        "Even if you ignore the L4 cache, the M7's caching scheme is in fact a step backwards for people who value single-thread performance."

        Not just single-thread performance, but multi-thread as well. One of the primary uses of large caches is to avoid cache thrashing in the event of many threads (or partitions) hitting the same core over time and causing cache to be constantly refreshed from memory. The greater the multi-threading and the greater the partitioning, the more cache you need.

        1. Roo
          Windows

          Re: This explains IBM's $3 billion systems invest FUD

          "Not just single-thread performance, but multi-thread as well."

          My gut says you're right, but there have been some pretty stunning massive thread count success stories, like GPUs for instance. They tend to operate well below peak, have relatively tiny cache and suck data through a fat but very long straw, but they dominate the Green500 list nonetheless.

          I still prefer working on machines that can sustain a high percentage of peak performance on a single thread. The Pentium Pro 200 (256kb L2 @ core clock) was a fine example of that style of core, it worked miracles on gnarly dusty deck code. :)

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like