Does your performance in the datacenter suffer because you don’t have enough memory to really get the job done? Do you have apps that don’t perform well on clusters, or don’t parallelize at all? If this describes you or a loved one, read on, because Cray thinks it has the solution for you. Cray, with partner ScaleMP, recently …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Thursday 3rd October 2013 03:16 GMT Lars

And still

There are those who claim Linux will not scale. Not that it really matters as those who buy that sort of equipment know what they want.

1 0
1. Thursday 3rd October 2013 11:03 GMT Roo
  
  Re: And still
  
  I guess those guys who refuse to learn the lesson will continue to take a big hit in the profit margin. :)
  
  0 0
  1. Thursday 3rd October 2013 11:51 GMT zooooooom
    
    Re: And still
    
    Implementing cache coherency over IB doesn't scream scaling. Not unless someone publishes the NUMA ratio to back it up.
    
    0 0
    1. Thursday 3rd October 2013 17:32 GMT danolds
      
      Re: And still
      
      I see what you mean, but I remember the days when we had SMP systems using buses and crossbars with flat flat access times (ie. non-NUMA). The system I'm most familiar with, the Sun E10000, had a 16x16 crossbar with 1.3 GB/sec bandwidth at about 550 ns latency. The E10k scaled like a scorched weasel (which is pretty good) when compared with the systems of that era.
      
      Today, you can get 56Gb FDR Infiniband that offers 6.8 GB bandwidth at around 700 ns latency. That's not too bad at all, and should be enough to ensure reasonably linear scalability on a good portion of workloads.
      
      In any case, a SMP system should be able to scale better than a cluster running message passing. While sitting here writing out this reply, I can't think of any application that would scale better in an MPP system vs. SMP, but I could be wrong.
      
      0 0
      1. Friday 4th October 2013 10:14 GMT ToddR
        
        Re: And still
        
        In any case, a SMP system should be able to scale better than a cluster running message passing. While sitting here writing out this reply, I can't think of any application that would scale better in an MPP system vs. SMP, but I could be wrong.
        
        Sorry you are wrong. There is always an overhead, (cache coherency), which the cluster does not have, so scaling classical MPI code like CFD will "always" scale better than any/all ccNUMA systems. ScaleMP does well with CFD because inherently in HW it's a cluster. Where is doesn't do well is in mixed workloads, e.g. CFD and FEA structural analysis on the same system.
        
        0 0
      2. Saturday 5th October 2013 05:05 GMT Anonymous Coward
        
        Re: And still
        
        The market for big SMP UNIX boxes has shrunk but meet the great grandson of the Sun E10K: the SPARC Enterprise M9000. It has 64 sockets of SPARC64 running at 3 Ghz, (which gives you 256 cores, 512 threads) 4 TB of memory, and a crossbar interconnect rated at 737 GB/s.Also comes with 288 I/O slots. Available from either Oracle or Fujitsu. It will smoke that Cray box; but it's not cheap.
        
        0 0
        
        Saturday 5th October 2013 17:04 GMT Anonymous Coward
        
        Re: And still
        
        I guess M6-32 is the grandson of M9000 then.
        
        http://www.oracle.com/us/products/servers-storage/servers/sparc/oracle-sparc/m6-32/overview/index.html
        
        32TB of memory + 32 CPUs at 3.6GHz (384 cores, 3072 processing threads) with 3,072 GB/sec interconnect. And you can run single Solaris image on it. Talk about scalability...
        
        0 0
      3. Saturday 5th October 2013 18:30 GMT Roo
        
        Re: And still
        
        "In any case, a SMP system should be able to scale better than a cluster running message passing. While sitting here writing out this reply, I can't think of any application that would scale better in an MPP system vs. SMP, but I could be wrong."
        
        The data/sync signals travel the same distance regardless of whether the code explicity (Message Passing) or implicitly (Shared Memory) initiates the transfer. There is no physical reason why Message Passing code should be slower than Shared Memory on the same hardware, it's all down to the implementation.
        
        For the record these days shared memory systems are usually implemented on top of hardware that does message passing (mainstream examples from the x86 world: HyperTransport, QuickPath).
        
        In my time I have implemented lightweight messaging passing APIs that sit on shared memory and converted shared memory code to make use of them. Benchmarking revealed that there was no performance penalty over the original code on the same hardware. The message passing code was portable while the shared memory code was pretty much tied to the machine it was developed on, and the message passing code could tackle problems that exceeded the available address space, so it was far more scalable than the original code.
        
        Personally I think the biggest benefit of message passing is that it formalizes the interaction between processes, which (can) be leveraged to produce more robust and fault tolerant code. I like fault tolerant code because when you run it on a seriously big network of machines you can almost guarantee that a node will fail somewhere. Making an existing piece of shared memory code survive losing a chunk of it's address space and a processor is usually very difficult if not impossible.
        
        Don't get me wrong. For some problems I really wouldn't bother with message passing, but they are in the minority. :)
        
        0 0
  2. Thursday 3rd October 2013 16:44 GMT ToddR
    
    Re: And still
    
    Reply Icon
    
    Re: And still
    
    What are you trying to say?
    
    0 0
2. Thursday 3rd October 2013 14:08 GMT itzman
  
  Re: And still
  
  Frankly if the whole of that memory looked like a bloody fast DISK it would still make stuff immensely faster.
  
  summing huge series or working over large arrays of data is not generally CPU bound. The actual algorithms work in small ram areas, its collecting and storing the data to work on that screws things up.
  
  0 0
  1. Thursday 3rd October 2013 17:15 GMT danolds
    
    Re: And still
    
    Yeah, very good point on that. And this solution should be a hell of a lot faster on those types of workloads vs. message passing on clusters. But we'll need to see some number to really know for sure.
    
    I expect to see this kind of technology filter down into enterprise fairly soon. ScaleMP is part of the secret sauce in SAP's Hana in-memory analytics product set.
    
    0 0

This topic is closed for new posts.

Topics

Special Features

Vendor Voice

Resources

User topics

Article topics

User topics

Article topics

Cray turns cluster crank with ScaleMP

COMMENTS

And still

Re: And still

Re: And still

Re: And still

Re: And still

Re: And still

Re: And still

Re: And still

Re: And still

Re: And still

Re: And still

Other stories you might like

Microsoft foresees a new type of AI PC: A Surface designed with help from machines

India and EU finally advance HPC collaboration project hatched in 2022

Los Alamos Lab powers up Nvidia-laden Venado supercomputer

Intel's neuromorphic 'owl brain' swoops into Sandia labs

Butler Investments joins Atos rescue party

Google is wrong to put AI search features behind paywall, says HPC leader

Lenovo scores deal to build supercomputer at UK's Hartree Center

Australian supercomputer 'Taingiwilta' comes online this year with [REDACTED] inside

Italy's military mulling space-based supercomputing cloud

Tencent explores a future where HPC, quantum, cloud and edge have converged

HPE's updated Spaceborne Computer-2 ready to hitch another ride to the ISS

Europe's first exascale system will be slotted into modular containerized datacenter

About Us

Our Websites

Your Privacy