back to article Zombie Moore's Law shows hardware is eating software

After being pronounced dead this past February - in Nature, no less - Moore’s Law seems to be having a very weird afterlife. Within the space of the last thirty days we've seen: Intel announce some next-generation CPUs that aren’t very much faster than the last generation of CPUs; Intel delay, again, the release of some of …

Silver badge

You can already write code to design a chip

VHDL has been used to do this for some time, and it can be fed into simulators (i.e. interpreted) or compilers. It can't do everything, and doesn't produce the most efficient design, but for doing stuff that's more run of the mill than designing an A10 or 24 core Hololens chip, it works well at a much more reasonable budget.

The expensive part of a chip, especially in a leading edge project, might not even be the design team but rather the mask set, the cost of which can now exceed $10 million.

8
1
Silver badge

Re: You can already write code to design a chip

True, but then VHDL sucks donkey balls when it comes to ease of use, cost and helpfulness of tool chains, and generally getting stuff working quickly. Its a dense and very pedantic language originally build by US DoD committee to standardise the building of ASICs.

It might be great for those who spend a lot of time using it, and obviously it (along with others like Verilog and simpler ones like ABLE, etc) are based on parallelism which is natural to hardware but not to procedural languages, but it is so far from something that you could easily get casual interest students using.

4
3
Alien

Re: You can already write code to design a chip

Unified determinism carries their own limits. Von Neumann topology is just one among lots.

1
0
Anonymous Coward

Re: You can already write code to design a chip

The vast majority of ASICs, including processors based on ARM architecture, will be designed using VHDL or Verilog.

There are also tools available on the (EDA) market that will create HDL for a processor based on a specified instruction set requirement.

0
0

Moore's law is misquoted too much. The original observation is about price per computation power over time, which still seems to be holding well. The misquotation is usually price per gate, based on silicon area, or some other manufacturing derived metric. But the original macroscopic "whole compute" level figure is still tracking well. Note that "compute" is both hardware, software, or next-gen-whateverware, whatever that may be.

15
0

//The original observation is about price per computation power over time, which still seems to be holding well. The misquotation is usually price per gate, based on silicon area, or some other manufacturing derived metric.//

Actually I think you have that wrong. The original observation was about the rate of increase of components per integrated circuit; this was developed and modified over time to become the observation about computation cost.

At least, if you trust the Wikipedia article, and my memory of other sources.

https://en.wikipedia.org/wiki/Moore%27s_law

7
0
Anonymous Coward

https://drive.google.com/file/d/0By83v5TWkGjvQkpBcXJKT1I1TTA/view

You're both wrong ;)

0
0

"You're both wrong"

Actually I'd say it looks like we're both right. The original paper you link to mentions falling cost and rising component count - right in the subheading. But the paper overall I'd say emphasizes miniaturisation and increasing component density as the primary factor.

2
0
WTF?

I guess the whole 'Software Defined' (storage, networking) isn't happening then?

Why bother with custom ASICs when you can just use off-the-shelf hardware that is plenty fast enough for the job?

Hololens as an example vs. Atom? Really? A Casio watch is more powerful than an Atom. But, also, for single task workloads a custom processor can be useful since it can reduce power and cost by only having the components needed.

> Apple’s new A10 chip, powering iPhone 7, is as one of the fastest CPUs ever.

'is as'?

Also, how inaccurate. Lets compare an A10 vs. a modern Intel CPU.

http://browser.primatelabs.com/processor-benchmarks#2 vs. http://browser.primatelabs.com/ios-benchmarks

The A10 is slower on single core workloads by a large margin, and the multicore result from the A10 is around the same result as the single core result on the Intel. Now turn on multicore on the Intel...

To get the (around) 6x performance increase on the A10 you'd need to bolt another 6 of them together, consuming considerably more space.

Don't mistake me. A10 is good for a mobile CPU, but no where near the 'fastest cpu'.

24
0
Anonymous Coward

"I guess the whole 'Software Defined' (storage, networking) isn't happening then?

Why bother with custom ASICs when you can just use off-the-shelf hardware that is plenty fast enough for the job?"

That's not really the point of software defined networking (not sure about storage, not my area). The commodity hardware you use for a software defined network still has the custom ASICs needed for fast packet forwarding, you've just taken the high level functionality (such as building and maintaining the routing tables that govern this forwarding) and abstracted it away to a separate network controller box which does its work in software. So you end up paying your "commodity" network equiment manufacturer $$$ for the fast switching hardware, but only pay $$ for the control hardware/software elsewhere - as opposed to $$$$$ for hardware and $$$$$ for software if you buy it as an integrated unit with full proprietary lock down from the likes of Cisco. That's the theory, at least. You're certainly not going to be replacing your network switch with an off the shelf x86 box with a bunch of interface cards, if that's what you were thinking (well, it might work for a lab setup, but for anything serious? No... just... no).

The rest of your post is fair enough though, so have an upvote on me.

5
0

Also there have been FPGA boards for the Pi before and the BeagleBoard. ModMyPi used to sell the BeagleBoard one, but it no longer seems to be listed and I can't remember what it's name was now.

I nearly bought a BB and the FPGA board for a project, but it seemed that you could only bash a byte at a time between the two processors. If I ever find an FPGA dev board that shares RAM with a "regular" CPU (ARM/Intel), I'd love it.

0
0

Sounds like you want a Zynq-based board, like a zedboard ?

Dual ARM Cortex-A9 and a bucketload of FPGA fabric, all nicely coupled and with a tolerable toolchain for both sides.

Or, if you don't need that much compute, just compile up a microblaze CPU or two in a standard FPGA. The gates that you need for a 32-bitter aren't expensive any more. You;ll not get the brute speed of a pair of GHz Cortex-A9s, but it's 400 MIPS of a tolerable RISC machine.

You can then bolt as much of your own logic as you like, very, very close to the CPU(s).

(Other FPGA architectures exist, I'm just in a Xilinx mode at the moment)

6
0
Silver badge

Papilio.cc

Are you thinking of the gadget factory's papilio FPGA boards ?

0
0

Build what you need

"but it seemed that you could only bash a byte at a time between the two processors. If I ever find an FPGA dev board that shares RAM with a "regular" CPU (ARM/Intel), I'd love it."

FPGA's don't have "interfaces" per se. The point of an FPGA is that you have hardware inside and you create your own specific hardware from that. So if you want to use an FPGA with an ARM cpu, decide what interface you want to use and then implement that in the FPGA. If you want to use shared RAM then fine, most FPGA's inlcude memory so you then need to implement your shared access in the FPGA hardware and handle address collisions etc. For those coming from a software environment, VHDL and Verilog can be a strange concept. It's not a program, unless specified there is no flow, and everything can have the potential to happen simultaneously.

1
0
Silver badge

There's also Altera's version in the form of the Cyclone V SE. You can share memory or talk to the CPU via the AMBA interface.

There's even a cheap(ish) Dev board in the form of the DE0 Nano SOC.

http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=941

2
0
Anonymous Coward

The TE0722 might be what youre looking for - a low-end Zynq, still with dual cortex A9

http://www.trenz-electronic.de/products/fpga-boards/trenz-electronic/te0722-zynq.html

This doesn't have DDR so not enough RAM to run Linux, but FreeRTOS should fit in the 256KB of on-chip RAM (which can be stretched by a couple of tricks like putting FPGA block RAMs on the processor memory bus, and accessing code from flash via the 512KB cache).

To run Linux, a Parallella board is still a nice low-cost option.

0
0

There's also Altera's version in the form of the Cyclone V SE. You can share memory or talk to the CPU via the AMBA interface.

There's even a cheap(ish) Dev board in the form of the DE0 Nano SOC

Hmmm, I spent ages looking at this board, but the block diagram (link below) shows the the RAM is only available on the HPS side and couldn't find any information regarding accessing the RAM from the FPGA side. It was a while ago though.

http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=163&No=941&PartNo=2

I've just redownloaded the user manual, and it lists the DDR3 under "Peripherals connected to Hard Processor System", but what I clearly didn't notice last time, is that the table also lists a bunch of FPGA pins, so that might actually be the board I'm looking for. Thanks :)

0
0
Gold badge
Go

What's really changed is the development tools

From a time when most custom chips were laid out with a set a coloured pencils and graph paper.

I'd suggest access to good tools was what made ARM doable by a very small team.

What a same size team could do today would be much larger.

But if this new hardware has it's own instruction set you'll have to generate a code generator for you're favorite tool chain (and languages) to support it.

The fact you can do this (beyond knocking up some in house assembler) may be one of Unix's lasting contributions.

3
0
Silver badge

Re: What's really changed is the development tools

Er, you have read the history of the ARM processor? From the Wikipedia page:

"A visit to the Western Design Center in Phoenix, where the 6502 was being updated by what was effectively a single-person company, showed Acorn engineers Steve Furber and Sophie Wilson they did not need massive resources and state-of-the-art research and development facilities."

7
0
Silver badge

Re: What's really changed is the development tools

showed Acorn engineers Steve Furber and Sophie Wilson they did not need massive resources and state-of-the-art research and development facilities."

But they certainly set about finding the best designers, they head hunted much the best chip designer from where I worked at the time.

0
0
Silver badge
Windows

Lots of handwaving and metaphoring in this article

Implementing functions using dedicated circuits means the functions can be computed polynomially faster than if they were done by a state machine: TRUE.

Implementing dedicated state machines in hardware means the state machine can run polynomially faster than if a software-defined state machine were implemented by a generic hardware-defined state machine: TRUE

These optimizations can now be done as tools and component libraries reach maturity and small fabrication runs become economically viable for a bespoke solution: TRUE.

Go for it!

(I remember having lots of fun doing a multiplier and various other circuits on an FPGA using a user-friendly graphical editor back in 1992 on a small host system for exercises. That was also when the first series of articles about "software-defined hardware" appeared. A prime number generator on a chip was being talked about in BYTE as I remember...)

2
0
Anonymous Coward

Re: Lots of handwaving and metaphoring in this article

It isn't a new trend. Back in 1987 I was working on the functional design of an ASIC to be closely coupled to a SPARC processor to offload arithmetic functions that were too time consuming to do in software. Going back further to 1982 I designed hardware that used a maths co-processor coupled to a Z80 processor which need more computation horsepower to do curve fitting.

Also I "think" the ARM core was available in ASIC libraries from at least the early nineties. Since then there have been numerous implementations of hardware assisted ARM based systems, so this really isn't anything new.

3
0

So, about Intel having bought Altera...

One might have thought that Intel buying a massive FPGA company, giving them access to programmable hardware and a toolchain to drive it, might have been worth a mention in this article?

Some of Intel's CPUs and some FPGA fabric on the same die (or maybe in the same package, along with some stacked RAM), all on Intel's spiffy process, should be interesting. Expensive, no doubt, but interesting.

4
0
IJD

ASIC vs FPGA vs CPU

There's a continuous space of power vs. flexibility -- to do the same amount of processing and in the same process node, at one end a fixed-function ASIC is by far the lowest power and die size but inflexible, an FPGA is more flexible but higher power and die size (typically ~10x), a CPU is completely flexible but much higher power and die size again (typically ~100x).

Cost would follow the same trend if volumes were similar but they're not, given the very high NRE cost of the latest process nodes tilts the costs towards FPGAs and CPUs unless your volumes are very high. Saying that doing the same job would cost $100 in a CPU or $10 in an FPGA or $1 in an ASIC is true, if you want tens of millions of them -- bear in mind that the total NRE (design and mask) for even a small ASIC in the latest process nodes is at least tens of millions of dollars, or more than a hundred million for a more complex one, so you need to sell a lot of chips to get this back.

So for most cases CPUs or FPGAs make more sense, advanced process node ASICs (or custom CPUs with custom hardware accelerators) only really make sense where the need to get lower power is absolutely imperative and the TAM justifies the cost. One interesting trend driven from this is that such designs move back into the companies who make the end product like Apple (vertical integration), because getting lower power (or higher speed) by doing your own 10nm chip makes sense if you can clean up the market selling a $600 product with $300 gross margin, but not if you're selling a $60 chip with $30 gross margin at the same volume.

3
0
Silver badge

Re: ASIC vs FPGA vs CPU

FPGA is used for low volume or where per unit cost or power consumption is irrelevant. ASICs are almost always modelled as an FPGA first.

Your Verilog / VHDL can be run as a simulation with monitoring on a PC/Workstation before downloading to the actual FPGA. You describe FPGA, you are not having a software defined Hardware, but re-configurable hardware modules and look up tables to implement a hardware design. So of course it's massively parallel, it's not a CPU unless that's in your HW description/Design.

I've only used the Xilinx FPGAs and tools though, not Altera.

CPUs and FPGA/ASICs are complementary. Some things only need hardware, no CPU. Some things are more easily realised as programs on a CPU, hence CPU plus FPGA. A Port or shared RAM can be used, or hardware CPU core on FPGA, or FPGA defined to create a simple CPU (6502, Z80, PIC), or an SoC with CPU cores, an FPGA area and conventional SoC i/o GPU. This allows field upgrades, wereas the ASIC or conventional SoC can only have the CPU / GPU firmware/programs changed.

1
0
Silver badge

Re: ASIC vs FPGA vs CPU Development

Continued ...

Verilog and VHDL are NOT programming languages, they are Hardware description languages that are then translated into an FPGA configuration file or an ASIC specification. The same source can produce either and optionally include a CPU design (even if the FPGA has no CPU core), which is then separately provided with microcode, firmware, programs etc. The ASIC version would have a real CPU core, if one was defined.

0
0
Silver badge
Pint

Re: ASIC vs FPGA vs CPU

Development Environment.

Menu.

Save As...

...Software

...Hardware

We ran into this years ago. Different rules for hardware projects and software projects. I told them that it's a spectrum, and they blinked in mindless stupor.

0
0
Silver badge

The whole problem with Van Neumann machine

... is power required for accessing the memory where program and data are stored. Compared to the power budged of actual computations, it used to be small in the previous century. No more - currently it is orders of magnitude higher than power used for actual computation. Additionally, the latency getting the data out of memory has not much improved in the past decades, compared to increasing CPU computing power. Even worse, since increasing parallelism had become the only viable choice for increased software speed, the synchronization of data in memory (that is, completed memory writes and cache synchronization between cores) has became critical to computing performance. There is little that can be done while we are still saddled with inefficient DRAM. However, FPGAs or ASICs also need to read and store data somewhere - even if the program is hard-wired. Of course for small programs there is nothing wrong with small amounts of SRAM, but things are different if you look to deploy these devices into wider environment, with large amounts of data flowing around. Which means they will hit memory limit too (actually I am pretty certain they are hitting it already). When much faster and cheap (both in terms of money and power budget) alternatives to DRAM become commercially available, the tables might turn again.

Still, it pays off to (and will continue to) know both hardware and software side of programming, so kudos for the article.

6
0
Anonymous Coward

Re: The whole problem with Van Neumann machine

Part of the problem is simply the speed of electricity. Computing has gotten so fast that in a single CPU cycle an electron can only travel, say, a few inches. Not to mention those CPUs get pretty hot when they're at full throttle (again, a sheer physical thing that's architecture-agnostic at this point). Which means you have conflicting issues. You need to get the memory close to the CPU to reduce the travel time, but that heat matter means they can't be too close, either.

0
0
WTF?

I Call BS

So future "Patch Tuesdays" will involve a global shipment of chips? ... Surely the whole point of general purpose CPUs is that the minutia of the actual behaviour of the systems which they run is abstracted away from the hardware and can be updated/modified as required.

I call BS on this whole concept: it's fine for specialised stuff as it's currently being used for, but will we ever see a "MS Word" chip? Hell no: that line of thought misses the entire point of having software.

0
7
Silver badge

Re: I Call BS

You're not getting it. Software will not be going away. What will be happening is that progressively more work will be offloaded to hardware, at least some of which can be soft-configured (which is the whole point of FPGAs). "Patch Tuesday" will contain updated soft configurations as well as traditional code. There's also the matter of the driver stack that connects the software to the hardware.

10
0
Silver badge

Re: I Call BS

"Patch Tuesday" will contain updated soft configurations

It already does. Also available for Linux.

0
0

Re: I Call BS

Well that's not "hardware eating software" then... That's just more software which happens to require a special chip to run. Meh.

0
0

Re: I Call BS

You're not getting it. ...

Absolutely. We've already got FPUs and GPUs. My work involves massive number crunching, generally via low-level machine-optimised libraries like the BLAS, FFTW, LAPACK, etc. While there are ongoing projects to port these to GPUs, I would love to see some of this implemented in hardware on dedicated co-processors.

1
0
Anonymous Coward

Re: I Call BS

Well, new patches for AMD/Radeon and Nvidia these days are exactly this.

Lots of work now gets offloaded to the massively parallel hardware in the GPU's (that are several orders of magnitude larger than any current CPU in any metric taken - sadly power consumption and heat output also), exactly with the intent purpose to free the CPU to run other code, or to allow the processing in a level no modern CPU could handle in the same time-frame.

And, for instance, video rendering and encoding was entirely coded for CPU's, but now some graphics cards can handle the encoding several times faster and offload the work for storage, freeing the CPU.

0
0

Nothing wrong with the chips.

It's shitty lazy code that's the problem.

Instead of throwing more hardware at the problem, maybe clean up and optimise the code a little?

18
3
jzl

Re: Nothing wrong with the chips.

It's shitty lazy code that's the problem.

No, it's not that simple. Code is a product. It is paid for with money.

Modern code is produced - feature for feature - for a fraction of the price of code 30 years ago. The reason for this is that development tools have become unbelievably productive. There's a trade-off in terms of performance on the underlying hardware, sure, but the way to improve raw metal performance of the code would be to forgo some of the tools that make developers so productive.

Besides, although it's widely said it's not completely true. Modern high FPS animated UIs are intrinsically compute intensive, as are many cloud based data workloads. Web browsers, too, are surprisingly compute heavy - layout and render of modern HTML is non-trivial, and that's even without taking Javascript into consideration.

Not to mention that there's a continual drive to improve tooling, particularly at the language level. Look at Javascript: modern browsers execute it orders of magnitude more efficiently than the very first Javascript enabled browsers.

12
0
Silver badge

Re: Nothing wrong with the chips.

"to forgo some of the tools that make developers so productive."

You mean make shit developers just about employable. All a proper dev needs to do his job is

- Compiler/interpreter

- Editor

- Debugger/tracer

- Profiler

- Disassembler (optional)

All of the above have been around since at least the 70s, most a lot earlier so they don't all need to be mashed together in an IDE with lots of cutesy graphics that requires 100 meg of memory just to boot either.

3
3
Silver badge
Windows

Re: Nothing wrong with the chips.

The great thing is you actually forget about the most important thing:

A language adapted to the problem sapce.

> since at least the 70s

Yeah no oldsy. Try debugging today's applications in 64 KB mainframe RAM.

4
0
Silver badge

Re: Nothing wrong with the chips.

"The reason for this is that development tools have become unbelievably productive."

Which enables features to be added easily. Making decisions as to which features should be included is extra work. If it's easier to just put them in anyway you get bloat and its associated performance costs.

"Besides, although it's widely said it's not completely true. Modern high FPS animated UIs are intrinsically compute intensive, as are many cloud based data workloads. Web browsers, too, are surprisingly compute heavy - layout and render of modern HTML is non-trivial, and that's even without taking Javascript into consideration."

In other words it's Shiny that's the problem.

8
1
Silver badge

Re: Nothing wrong with the chips.

@boltar - so you're coding exclusively in assembler and hitting the hardware directly are you? It's the compiler, hardware abstraction layers and library code that slows a modern program down compared to days of yor. All of those are good things in terms of productivity.

Even ripping those out you still have the basic problem that a CPU is designed to execute a stream of instructions, one at a time. There are assorted techniques used to make this as fast as possible, but it's still effectively a sequential process. Hardware is good at tasks that can be either pipelined or run in parallel (or both). If the workload is suitable then hardware can implement it thousands of times faster than the best written code.

4
0
Silver badge

Re: Nothing wrong with the chips.

"A language adapted to the problem sapce."

Yes, if only people actually took that approach. 30 years ago it was C is the answer to everything. 20 years ago it was C++, 10 years ago it was Java. Now every spotty dev just out of college thinks all you need to learn is Javascript or Python.

"Yeah no oldsy. Try debugging today's applications in 64 KB mainframe RAM."

Whooooosh......

Why do you think I mentioned the memory footprint of modern IDEs? *sigh*

0
1
Silver badge

Re: Nothing wrong with the chips.

"@boltar - so you're coding exclusively in assembler and hitting the hardware directly are you? "

Thats right, thats why I said compiler/interpreter on my THIRD line! Learn to read. And if the pavlovian trigger for you was "disassembler" and you don't know why you might need one for other languages then I suggest you go and educate yourself as to why.

1
1
jzl

Re: Nothing wrong with the chips.

Tools like node.js? Tools like unity? Tools like NHibernate? Tools like ActiveX? Tools like JQuery? Tools like Entity Framework?

And they may not need an IDE with cutesy graphics, but software development isn't a contest in theoretical purity, it's a race for productivity.

A modern "cutesy" IDE contains many features which make development very much faster and more productive.

I speak from direct, long standing and - if I may say so - very successful professional experience.

2
0
jzl

Re: Nothing wrong with the chips.

In other words it's Shiny that's the problem.

I'm involved in a large scale financial enterprise system (in-house for a large investment bank). It consists of a user-configurable highly responsive UI that allows rapid drilldown of massive datasets, configurable side-by-side charting and customisable dashboards.

It's fast, but it needs modern hardware.

None of it is there for "shiny". I'm not paid for shiny. It's there to provide subtle, powerful analysis of complex data. The data visualisation available through modern UI capabilities is not something I could code by hand from scratch, and it's not something I could shove through a 486-DX.

And it's certainly not something a team of our size (four developers) could write without access to some powerful but high level libraries.

4
0

Re: Nothing wrong with the chips.

"No, it's not that simple. Code is a product. It is paid for with money."

So after all that...it's still the code that's the problem!

0
1
Silver badge

Re: Nothing wrong with the chips.

All a "proper dev" needs is switch. The machine code is inputted, bit by bit, by toggling the switch.

3
0

Re: Nothing wrong with the chips.

@Steve Todd

"so you're coding exclusively in assembler and hitting the hardware directly are you? "

Yes, yes I am. Even if boltar isn't...

Even there, though, new instructions get added that perform at hardware/firmware level what used to be a routine, e.g. the checksum instruction.

"the basic problem that a CPU is designed to execute a stream of instructions, one at a time"

No, that's not how they're designed anymore.

2
0
Silver badge

Re: Nothing wrong with the chips.

All a "proper dev" needs is a magnetized needle and a steady hand.

(XKCD 378)

4
0

Re: Nothing wrong with the chips.

Actually, try getting the same amount of Work out of today's "Modern" languages as that 64KB Mainframe used to perform.

In point of fact, some of the applications developed for those 64KB Mainframes are still running today because A) they still work & B) No one wants to spend the Millions of dollars and years of effort even Trying to replicate that software in a "Modern" language would cost.

I've worked on a couple of those "Modernization" projects and the Performance of the resulting software was godawful. Even when it "Worked", it usually took at least twice as long to perform the same amount of work and had "issues" that the developers Promised would be Fixed "any day now".

Then there were the "Enhancements" and "Extra Features" that the project managers just couldn't resist adding in. Suddenly, something that performed a critical job reasonably well did a whole Bunch of things, all of them Poorly.

Of course, in one particular instance, the Poor Performance turned out to be Designed In.

I didn't understand Why some of the design choices were made for the project until I discovered that the Consulting Firm doing the Development also had a contract to Run the finished application as a Service Bureau which would get paid for CPU time and Database Storage on a Cost Plus basis.

Suddenly, the poor performance and lousy data design made a Lot more sense if you knew it would be driving the company's Profits.

1
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2018