Is that a 64-bit ARM Warrior in your pocket? No, it's MIPS64 • The Register Forums

Tuesday 2nd September 2014 14:26 GMT Richard 33

Proprietary?

I wonder if like everything else coming from Imagination, you'll have to offer your first born children as sacrifices even to get access to basic programming details?

6 1 Reply
Tuesday 2nd September 2014 14:52 GMT ForthIsNotDead

Funky!

"Where rival ARM's shift from 32-bit ARMv7 to 64-bit ARMv8-a involved rewriting chunks of its instruction set and forcing some low-level engineers to learn a new assembly language, MIPS64 is basically MIPS32 with instructions for using 64-bit-wide data, and it runs MIPS32 code without a mode switch."

I instantly thought of the Data General Eagle and came across all warm and fuzzy.

2 0 Reply
1. Tuesday 2nd September 2014 20:11 GMT Roo
  
  Re: Funky!
  
  "I instantly thought of the Data General Eagle and came across all warm and fuzzy."
  
  You have to love DataGeneral for naming a product "SuperNova" - a derivative of which was allegedly the fastest minicomputer for a decade (source: Wikipedia :P) ...
  
  0 0 Reply
  1. Wednesday 3rd September 2014 02:16 GMT alwarming
    
    Re: Funky!
    
    DG-UX was a pioneer in NUMA too!
    
    0 0 Reply
Tuesday 2nd September 2014 15:06 GMT cmannett85

"has the simultaneous multithreading (SMT) ... this technology essentially turns each physical core into two or four virtual cores. A hardware scheduler interleaves the virtual CPU threads into the processor's execution queues"

So, not simultaneous then - or have I misunderstood something?

0 0 Reply
1. Tuesday 2nd September 2014 15:11 GMT localzuk
  
  Pseudo-simultaneous. The hardware scheduler is better at scheduling tasks than end users, so you see a speed up but in effect you're just getting better performance out of similar hardware.
  
  1 1 Reply
  1. Tuesday 2nd September 2014 18:45 GMT BlueGreen
    
    > The hardware scheduler is better at scheduling tasks than end users
    
    unless I'm being stupid, it's an instruction scheduler, not a task scheduler, and the 'scheduling' is very basic such as instruction interleaving (as the article says).
    
    > but in effect you're just getting better performance out of similar hardware.
    
    Hmm. I've heard this before but it didn't pan out IME. I kicked off some heavy processing work[*]. I ran it repeatedly, steadily upped the number of parallel threads that I allowed for that query, and it scaled linearly up to the number of physical cores. Once it started using virtual cores the rise stopped, and as more virtual cores were used, performance slowly fell. However, that may have been an atypical workload. Perhaps if it had been cache-bound rather than memory-bound it may have done better. It was running on a huge dataset. Dunno.
    
    [*] happens it was in a DB but all memory resident so the disk was never touched.
    
    5 0 Reply
    1. Tuesday 2nd September 2014 20:28 GMT Anonymous Coward
      
      Maybe you're not being stupid
      
      But you are slightly misinformed. SMT usually refers to the ability of the hardware to present multiple threads of execution to the OS so that the hardware (think ALUs and stuff like that) is kept busy even if one thread blocks (due to slow memory). So: Pseudo-simultaneous, yes. Yielding more performance out of only slightly increased HW cost, also yes.
      
      3 0 Reply
      1. Wednesday 3rd September 2014 01:44 GMT Anonymous Coward
        
        Re: Maybe you're not being stupid
        
        For some more detail, consider the picture in the article. While it doesn't go into all the details about the width of the blocks, it still shows most of the relevant information. Most of the computation work happens in the blocks after the Instruction Issue Unit. From the image, you can see that each of these units are separate and don't interact with each other. But each of them is able to do a substantial amount of work. This allows the CPU to be running two integer instructions, a floating-point instruction a branch and a memory request all simultaneously. With a traditional architecture, we'd issue one instruction to the set of compute blocks every cycle and the others would sit idle until they got an instruction. Instead, SMT tries to keep all of those blocks busy by issuing an additional set of instructions from an unrelated program.
        
        In this case, we're looking at a dual-issue processor, which means that it can fetch, decode and issue two instructions at the same time. Thus, the entire processor is actually capable of running two threads simultaneously, but without having to duplicate all of the heavy/expensive compute hardware.
        
        This also leads to an explanation of why your database wouldn't scale past the number of full cores on the machine. Since a database is heavy on memory access, those queues are going to be well saturated by a single thread. Adding additional threads of the same type of task won't work well, but you could have easily fit a program that needed lots of arithmetic onto the processor without affecting your database performance substantially.
        
        3 0 Reply
2. Thursday 4th September 2014 00:29 GMT diodesign
  
  "So, not simultaneous then - or have I misunderstood something?"
  
  I suppose I should have made clear that in four virtual core setup (four hardware threads), they feed into two execution queues that do all the hard work simultaneously. A two virtual core setup feeds into one. The hardware scheduler keeps the queues topped up so something's always happening, in theory.
  
  C.
  
  (Posting on my day off hence no Reg badge; I cba logging into work.)
  
  0 0 Reply
Tuesday 2nd September 2014 15:53 GMT simon_brooke

384 cores in one memory pool?

So, am I counting this right? 64 clusters of six cores each per node, adding up to 384 cores sharing an address space? That has the potential to massively parallel, array processing stuff easy to do on relatively low cost hardware.

Nice.

2 0 Reply
1. Tuesday 2nd September 2014 18:46 GMT BlueGreen
  
  Re: 384 cores in one memory pool?
  
  is it coherent? If so, how the $%^&*( do they manage to do it?
  
  1 0 Reply
  1. Tuesday 2nd September 2014 20:14 GMT Roo
    
    Re: 384 cores in one memory pool?
    
    "is it coherent? If so, how the $%^&*( do they manage to do it?"
    
    Here's my value-free unresearched cynical speculation : Magic Smoke.
    
    0 0 Reply
Tuesday 2nd September 2014 16:50 GMT MyffyW

Daft Question

I wonder if you could run NT4 on a 64-bit MIPS?

Yes I know.... why would you want to.... it's just I have a sort of nostalgia for the days when M$ supported various flavours of processor.

3 1 Reply
Tuesday 2nd September 2014 22:00 GMT Anonymous Coward

192 cores (GPU) orderable today, £200 ? Review soon?

"massively parallel, array processing stuff easy to do on relatively low cost hardware."

Email from Maplin today announcing this tiny (5"x 5"x 1") 192-core beast for £200:

"The NVIDIA Jetson TK1 development kit unlocks the power of the GPU for embedded applications. Built around the revolutionary Tegra K1 SOC, it uses the same Kepler computing core designed into supercomputers around the world. It is a fully functional CUDA platform that will allow you to quickly develop and deploy compute-intensive systems for computer vision, robotics, and medicine.

NVIDIA provides the BSP and software stack, including CUDA, OpenGL 4.4, and the NVIDIA VisionWorks toolkit. With a complete suite of development and profiling tools, out-of-the-box support for cameras and other peripherals, you have everything you need to realize the future of embedded.

[snip]

NVIDIA Kepler GPU with 192 CUDA cores

NVIDIA 4-Plus-1 quad-core ARM Cortex A15 CPU

2 GB x16 memory with 64 bit width

16 GB 4.51 eMMC memory"

[continues]

http://www.zotac.com/uk/z-zone/nvidia-jetson-tk1

http://www.maplin.co.uk/p/zotac-jetson-tk1-developer-kit-a30ny

http://www.linuxuser.co.uk/reviews/zotac-nvidia-jetson-tk1-review

I don't know if I want one, but some folks might.

0 0 Reply
1. Thursday 4th September 2014 02:05 GMT Anonymous Coward
  
  Re: 192 cores (GPU) orderable today, £200 ? Review soon?
  
  Already on my wishlist for when I completely finish my server. I want to use it for a ground based drone sensor array. That's my "fun day" project although the workstation [CUDA city!] and server are part of this too.
  
  0 0 Reply
Wednesday 3rd September 2014 06:32 GMT chuckufarley

If only...

...We could get more ARM and MIPS systems in the desktop space. They are powerful enough for the majority of workloads even if the choice of software would be a little limited.

7 0 Reply
Wednesday 3rd September 2014 06:40 GMT MacroRodent

RISC, not IRONIC

From article: "Ironically, MIPS and the new ARMv8-a (PDF) instruction sets are conveniently similar: for instance, they both have a fixed register that always contains a zero value, they both have tons of general purpose registers, each instruction is the same width, the program counter is not directly accessible, and so on."

I don't see anything ironic here. These are the features that actually distinguished RISC processors from CISC in the first place. Every real RISC architecture implements at least some of these, especially the fixed-width instruction format and the large number of general-purpose registers.

5 0 Reply
1. Wednesday 3rd September 2014 08:41 GMT Torben Mogensen
  
  Re: RISC, not IRONIC
  
  While it is true that these features are what is generally seen to distinguish RISC from CISC, the original MIPS design has a large part in that definition: It was (alongside the Berkeley RISC processor, which is the forefather of SPARC) basically what defined the concept.
  
  I have long thought that ARM should have moved the PC out of the numbered registers when they moved the status register to a separate, unnumbered register. While you save a few instructions by not having to make separate instructions for saving/loading the PC, PC-relative loads, etc., most instructions that work on general registers are meaningless to use with the PC. And in all but the simplest pipelined implementations, it complicates the hardware to make special cases for R15 (the PC). So this move is hardly surprising. I'm less sure about the always-0 register. I think it would be better to avoid this (gaining an extra register), and make a few extra instructions for the cases where it would be useful, e.g., comparing to zero.
  
  And while code density is less of an issue now than ten years ago, I think ARM should have designed a mixed 16/32-bit instruction format. For simplicity, you could require 32-bit alignment of 32-bit instructions, so you would always use 16-bit instructions in pairs, and branch targets could likewise be 32-bit aligned. For example, a 32-bit word starting with two one-bits could signify that the remaining 30 bits encode two 15-bit instructions where all other combinations of the two first bits encode 32-bit instructions.
  
  0 0 Reply
  1. Wednesday 3rd September 2014 09:37 GMT MacroRodent
    
    Re: RISC, not IRONIC
    
    "And while code density is less of an issue now than ten years ago, [...]"
    
    I recall compiling some programs for MIPS and some other CPU:s back in the 1990's, and the MIPS exes usually turned out to be around twice as large as the i386 or VAX ones. But this was not a big deal even back then.
    
    0 0 Reply
    1. Wednesday 3rd September 2014 10:16 GMT Wilseus
      
      Re: RISC, not IRONIC
      
      Coming from a background in ARM, MIPS and i386, I think that while the code size is not much of an issue in some ways, it's very much an issue when you take instruction cache into account. If the MIPS code size is twice the size of i386 then compared to the Intel chip, the instruction cache size is effectively halved.
      
      I realise this is veering wildly off topic, but I'd be interested to see how 32-bit ARM compares to MIPS in this respect, given its implicit shift instructions and multiple load and stores.
      
      0 0 Reply
      1. Wednesday 3rd September 2014 18:32 GMT Anonymous Coward
        
        Re: RISC, not IRONIC
        
        " I'd be interested to see how 32-bit ARM compares to MIPS in this respect, given its implicit shift instructions and multiple load and stores."
        
        And ARM's predicated instructions (dropped in the 64bit, rather unavoidably, too much state to carry around), and their Thumb instructions for high code density.
        
        High code density is not just good for using less memory for a given task, it's also good for getting more performance out of a limited bandwidth memory system delivering the instructions.
        
        Are you aware of the CoreMark low-level benchmarks? Might be worth a look if you're not.
        
        Lots of factors to look at before an informed decision can be made.
        
        Does "Nobody ever got sacked for specifying ARM?" apply yet?
        
        0 0 Reply
        
        Thursday 4th September 2014 13:45 GMT Wilseus
        
        Re: RISC, not IRONIC
        
        "And ARM's predicated instructions (dropped in the 64bit, rather unavoidably, too much state to carry around)"
        
        Yes, I deliberately didn't mention those because I'm not sure they actually help with code density.
        
        I haven't investigated CoreMark, no. I'll have a look.
        
        0 0 Reply
        
        Saturday 6th September 2014 22:38 GMT Roo
        
        Re: RISC, not IRONIC
        
        "Yes, I deliberately didn't mention those because I'm not sure they actually help with code density."
        
        I got the impression (years ago, in the days when people were speculating about EPIC) predicated instructions were about improving performance rather than coding density.
        
        0 0 Reply
    2. Friday 5th September 2014 21:31 GMT Morten Bjoernsvik
      
      Re: RISC, not IRONIC
      
      Risc64 has been around since 1994 with the introduction of R8000.
      
      I was in university at that time, and was one of the lucky sods to run my code on the state of the art Silicon Graphics Power Challenge Array. Whow It flew! Around 8 times faster than the Wax I previously used. R8000 was the processor that skyrocked SGI into the hpc space.
      
      Unfortunately the R10k and followers where not as good. Mainly because the complex superscalar pipeline was hard to upclock. I recall all the problems with getting the R10k above 195mhz. By that time the AMD opteron had arrived and the dark ages of Mips had started.
      
      0 0 Reply
  2. Wednesday 3rd September 2014 20:44 GMT Anonymous Coward
    
    Re: RISC, not IRONIC
    
    The "always zero" register is surprisingly useful, both as a source ("read zero") and a destination ("discard result"). However, comparing to zero isn't one of its main uses as ARM's A64 also has a "compare immediate", which seems to be the compilers' preference for comparing with zero. Note, however, that CMP (compare) is just a synonym for SUBS (subtract and set flags) with the destination register WZR/XZR. You can change any flag-setting operation into a kind of compare by choosing WZR/XZR as the destination.
    
    Also, although the register number 31 usually means WZR/XZR, with some instructions it means SP, the stack pointer, which likewise is no longer a general-purpose register. Therefore, if you wanted 32 general-purpose registers you would have to add quite a lot of extra instructions.
    
    As it is, the A64 encoding is a thing of great beauty. You hardly even need an assembler! It's like going back to the days of the 6502 when you could program with a hex editor, only occasionally referrring to the single-page table of instruction encodings! ... Yeah, I exaggerate somewhat - the 5-bit register fields don't match the 4-bit fields of a hex editor - but A64 is really neat.
    
    3 0 Reply
Wednesday 3rd September 2014 11:18 GMT Anonymous Coward

Is this the same MIPS...

... that provided the R4000/R5000 CPUs inside the SIlicon Grpahics Iris/Indigo/Indigo2 workstations and (IIRC) the Nintendo 64 ??

1 0 Reply
1. Wednesday 3rd September 2014 13:36 GMT chuckufarley
  
  Re: Is this the same MIPS...
  
  Yes, but I don't think it was Nintendo. I think it was Sony.
  
  0 0 Reply
  1. Thursday 4th September 2014 13:41 GMT Wilseus
    
    Re: Is this the same MIPS...
    
    It was both. The N64 and PlayStations 1 and 2 all had CPUs based on the MIPS family.
    
    0 0 Reply