We'll have to build chips which implement better internal logic and design, rather than taking the easy route of shrinking everything...
The general chair of the SC13 supercomputing conference thinks the semiconducting industry has reached a tipping point more radical – and uncertain – than it has gone through in decades. "We've reached the end of a technological era where we had a very stable technology," Bill Gropp, Thomas M. Siebel Chair in Computer Science …
We'll have to build chips which implement better internal logic and design, rather than taking the easy route of shrinking everything...
"We'll have to build chips which implement better internal logic and design, rather than taking the easy route of shrinking everything..
The various chip design journals & papers are worth a read, you should read them so you don't have to take my next statement on trust. The chip design bods have zillions of neat tricks above & beyond sitting on their duffs waiting for the process guys to pull the next rabbit out of the hat.
Now back to HPC...
Essentially for 'real' HPC applications time is money - which is why they throw a lot of money at making stuff go faster. With that in mind...
1) When you start running programs on physically large systems (multi-core, multi-processor, multi-rack, multi-site etc), you will eventually want to aggregate all those results of those sub-calculations together, and that is where latency kicks you in the balls.
2) At present the lower bound on latency is set at the speed of light (C). So far we haven't found a way to increase the value of C.
3) So if we want to improve latency, but we can't increase the speed of the signals tweak the speed of light, we are left with reducing the distance travelled by the signals (ie: shrinking stuff).
The CMOS guys have actually done a very good job. Even Intel seem to have got to the point where the only real tweak they have left is upping the cache size, and at this point they seem to have run into the latency wall there too as far as CMOS goes (increasing cache size increases latency of the cache).
hmm.. software 'engineer' are you ?
Unlike the quack discipline of 'software engineering', electricronic engineers are actually pretty good at what they do.
On the other hand, If chip design was handled by 'computer scientists' we'd all be sitting reading this on the worlds shittiest computer, while one of the fuckwits explained how they were making the next model from the ground up because the chips 'didn't quite do what they wanted to do', and this time would be using the one true methodology* to built it.
*extreme, agile, whatever other shite buzzwords crap the muppets have made up to use for the next few projects.
" rather than taking the easy route of shrinking everything."
Just waiting for the next technology node? Pfft.
This seems to be a post from someone without a clue about microelectronics (front-end or backend) design , library/IP development or EDA software.
computers scientists are dweebs. Software engineers are not.
Some of us started with electronics too.
Actually one way out of this mess is an ANALOGUE cell that can handle more than one bit - imagine 8 logic levels between one and 8 (0--5V) handled by a single switching element. 8 bit adder would be a snap..as would an 8 bit comparator
Not all software engineers are quack. And while EE's do a good job, we are using the world's shittiest arch anyway, x86. Hopefully this tech switch will poen up real opportunities for the tech world to get rid of that junk...
Dear stu 4,
Until you started swearing, I found reading your message in a Sheldon Cooper voice worked quite well for me.
Software quality is often something you could contrast sharply with say, the quality of bridge engineering. I think if you assume software engineers are all idiots, you have cast yourself perilously close to the label you are so happy slapping on other people.
I feel that a more enlightened, less aggressive approach to the subject might simply be to realise that the economic decision by managers of what is "good enough" dictates the entire culture of an engineering discipline. The balance between cost and product quality is determined by people trying to get a feel for what customers will accept.
Software that crashes occasionally but allows productive work to be done is somewhat tolerated. Bridges that fall down occasionally killing hundreds of people are frowned upon.
> If chip design was handled by 'computer scientists' we'd all be sitting reading this on the worlds shittiest computer,
if you'd replaced 'computer scientists' with 'average programmer' I'd agree. It's the computer scientists who pursue rigour and make the systematic improvements, it's they who design the formal frameworks that are ignored by crappy programmer types. I should say this is partly also a management thing, said managment being 90% crap themselves, and the minority that aren't realise that the 90% of the public is totally willing to accept crap, so guess what's produced.
Like turtles; it's crap all the way down.
Speaking a developer & very occasional software engineer, Amen to that. Have an upvote !
"2) At present the lower bound on latency is set at the speed of light (C). So far we haven't found a way to increase the value of C."
Actually a lot of the time stuff propagates at about 1/3 c on PCB tracks and normal wiring however by impedance matching the lines (which can be done on PCBs as well) you get close to c transmission speeds, and (AFAIK) that can be done at the chip level as well.
"Some of us started with electronics too.
Actually one way out of this mess is an ANALOGUE cell that can handle more than one bit - imagine 8 logic levels between one and 8 (0--5V) handled by a single switching element. 8 bit adder would be a snap..as would an 8 bit comparator"
Right - so you have big voltage swings -> more power loss driving signals, oh and a lower signal/noise ratio.
You might be better off sticking with an analogue computer - they exist. :)
"Software that crashes occasionally but allows productive work to be done is somewhat tolerated. Bridges that fall down occasionally killing hundreds of people are frowned upon."
As you were typing this, I was having a conversation about similarities between the process of bridge engineering, and the process of safety critical system engineering. Were you there?
We were wondering, with the introduction of shiny new techniques and tools which management want to replace proven (but allegedly costly and inefficient) engineering tools and methods, how long before the industry in question sees a Tacoma Narrows or a Millenium Bridge.
Is there a regulatory authority for bridge design and implementation? The 787 saga and the current Toyota court case in the USA seem to indicate that parts of the transport industry are in need of better regulation.
Quality is a horrible word. Pirsig got it right.
Your example doesn't.
Quality is a variable - whatever you are 'engineering'.
However to most 'software engineers' it is a constant. either on or off. They simple lack the understanding or technical skills to apply it variably. If there is one single 'gift' that makes a good engineer it is this thorough understanding of the concept of quality.
It is this that allows engineers to build a car that costs 10k or one that costs 100k, even though they do the same job.
To most software engineers (and I admit I am brushing you all with the shitty stick, but in my 20 years in the industry I can count the number of real engineer thinkers on one hand) the idea of building a piece of software that is not 'as good as it can be' is an anathema. They do not even understand what is wrong with this thinking. Their 'cars' will always cost 100k given the leeway - hence the need for someone with sense (the 'manager' in your example) to intervene.
It is not 'managers' who should make this call - it is the absolute corner stone of what makes a good engineer.
I used to work with a doctorate of IT - I told him that as a EEE when tasked with building an audio amplifier for example, I would take the specs, and see how I could best design something that was within the price range, and broadly met the specs - I'll select components that existed, and make approximations and trade offs. So, for example having calculated that I needed a 2.1K ohm resistor I would choose a 2.2k ohm, since it is commonly available, etc. I would end up having built something which met the requirements. In the real sense of the word - the quality was ideal for the solution.
My Dr friend maintained that this was not 'as good as it could be', and applying his software engineering to it, he would have designed and built a 2.1k ohm resistor, etc. It was only because of price that I had gone down this route, not good engineering. He maintained that his amp would be better, if more expensive.
At this point I felt like an athiest trying to convert a chistian, so gave up.
"My Dr friend maintained that this was not 'as good as it could be', and applying his software engineering to it, he would have designed and built a 2.1k ohm resistor, etc. It was only because of price that I had gone down this route, not good engineering. He maintained that his amp would be better, if more expensive."
Unfortunately "near-as-dammit" doesn't always map well into the digital domain, but you make a useful point that software devs should be making compromises, and as it turns out a lot do. I have met very few people who see themselves as 'software engineers' or 'computer scientists'. :)
For the software guys to catch up
Haven't you heard, we're all moving to the cloud, we don't need fast computers just billions of crap ones.
Rumours of the end of Moore are somewhat exagerated when we see month by month advances in size/power/energe/storage/connectivity/bandwidth/price. There are many, many interesting and promising areas of step-change technology inovations round the corner.
I don't recognise the premis of the article. If you look narrowly at one specific aspect of development you will find a tech dead-end but we have long since moved on from the single notion of cramming more transistors on a lump of sand.
1) It's "premise"
2) It's not the end of Moore, but of his observation
3) What Moore observes is clear and has nothing to do with "advances in size/power/energe/storage/connectivity/bandwidth/price" wishy-washyness: " the number of transistors on integrated circuits doubles approximately every two years" (and the price of the end product will be the same)
4) THOSE TIMES ARE OVER. The economic vagaries of going to XUV already say as much.
5) Deal with it.
> but we have long since moved on from the single notion of cramming more transistors on a lump of sand.
LOLNO. Still waiting to run Windows properly on multicore.
Yes, really ...
We had 2.2GHz CPU 1600 x 1200 pixel LCD laptops in April 2002. That's nearly 12 years ago.
so if Moore Law was only 2 years (people often say 18 months and long ago used to say 1 year) for LAPTOPS:
2004 : 1 Core 4GHz, 1G RAM
2006: 2 cores 4GHz, 2G RAM
2008: 4 cores 4GHz, 4 G RAM
2010: 8 cores 4GHz 8 G RAM
2012: 16 core 4GHz 16G RAM
2014: 32 core 4GHz 32 G RAM
Nope, Moore's "Law" was only an observation. Like the grain of rice on square one of a chess board, it was never going to hold true for long. Growth has barely even been linear.
HDD native speed, HDD Capacity, Network copper and WiFi Speed, RAM speed hasn't exceeded linear, in most cases declining improvements.
Yes, or "premiss".
We already know how to improve the performance of computers by several hundred percent - possibly by some orders of magnitude. It doesn't involve any technologies we don't already have. Nor does it require any major changes so far as the users are concerned. Indeed, for them, the improvements will be pretty transparent - expect for the screamingly fast performance they will see.
What is this change? Not new hardware, just properly designed and written software.
It's time to toss the bloated, inefficient existing software - with it's mess of interdependencies, incompatibilities and patched patches and start teaching people to write clean code with low overheads and that does no more than is required of it. At present the world of software development works on the same basis that NASA used for its moonshots: waste anything but time. In this case, time to market.
So we have software tools that put programmer productivity before runtime performance, resource requirements and size - on the basis that technology will provide whatever is necessary to run this stuff. That's fine while the curve is still on its upward climb and hardware is getting cheaper all the time. But all "S" curves reach their limits, eventually. Sooner or later the hardware won't be getting faster every year and then we'll start to see push-back from users who won't accept the Minimum Hardware requirements and will look for software that runs on their existing systems.
We already get this on smartphones and tablets, where a typical Android app weighs in at a few megabytes, compared with the hundreds of MB needed for a PC (or Linux) based package.
You never know: the root-and-branch reworking needed to remove all the cruft that existing software has accumulated over the decades might even give rise to more secure designs and possibly even less buggy code (and will definitely obviate all the workarounds built in for backwards compatibility). It's unlikely that the corporate behemoths will want to play, since this attacks their fundamental existence. But that might just be another advantage.
Surely you aren't suggesting we get rid of the cheap overseas programmers (I shall name no names) and actually PAY for programmers who know what the hell they are doing?
That's just crazy talk!
Why next, you'll insist that all sorts of industries improve their quality control! My god man, do know what that will cost? Fish and chips will cost £100 and dogs will sleep with cats and the world will come to an end!
plenty of expensive programmers that aren't "overseas" were and still are writing lazy code.
I agree, but I suspect things have gone way beyond the point where it is possible to persuade companies to reverse this trend. Not least because the root and branch reworking required would also have to lead to a mass culling of the management class and, indeed, of the programmers.
And does anyone actually teach efficient software development anymore?
whilst that works for some things, it doesn't work for others.,. E,g the massively compute intensive stuff done in matrix algebra is already coded as tight as it can go.
And huge fractions of bloatware are never executed at all. Or very seldom.
And sometimes shrinking code INCREASES execution speed. e.g JMP STANDARD_THING or CALL STANDARD_THING involves an extra jump or a call,. Inline coding does not and doesn't risk emptying a prefetched pipeline.
What is needed is thorough code review and analysis to spot where time is really being wasted.
Unfortunately, unless the all new and improved version that's now much faster and more secure comes with new features you're off to a losing start. Try convincing your manager or the finance team:
a.) That the new version will be the same as the old version, heck it might even have fewer features.
b.) That the new version will cost (assume relatively tiny app) 20 man months to develop, document  and test .
c.) That whilst the new version has no new features it will be faster and more secure .
 A robust design will need to be well documented if it's to be a reliable platform for future versions.
 A robust design will still need extensive testing shurely, it's not going to be bug free and arguably as a new software baseline could be more buggy initially than the predecessor version. Better the devil you know.
 Well it was faster and more secure until the requirements creep started...
"And does anyone actually teach efficient software development anymore?"
You have a very valid point. When I went to college, we were taught multiple software design methodologies, such as JSP (Jackson Structured Programming) and Warnier-Orr. However, I've never met anyone at Microsoft who had ever heard of such a thing. Not JSP, but simply the concept of structured software design. Every single person I met there with a BS or above had no clue about doing anything except stupid tricks that didn't work on a real project.
JSP is like a hot chainsaw through soft butter when it comes to slicing and dicing stream data. I'd get asked, "how do you do that?" And I'd show them. And I'd get blank looks from people with glassy eyes.
The next "frontier" is software, and it's a frontier that has never kept up with hardware. What's the latest development? Everything runs as a scripting language so it's all "open." Stupid. But at some point we'll see a real OS for high performance computing, and the kernel, etc., will be really small.
Manager will hear nothing between you saying new version and him asking himself how much can they sell the crap for.
> And does anyone actually teach efficient software development anymore?
Not that I'm aware of,nor feel the need for. I believe that past a certain point (beyond apprentice level), it's the individual developer's responsibility to educate themselves further. There are plenty of good books out there, mix that in with a little hand-dirtying and you're doing fine.
Does anyone have a problem with this?
"the massively compute intensive stuff done in matrix algebra is already coded as tight as it can go."
Is it, really?
I suspect that in many cases the techniques involved have been lost.
Earlier this year I was asked to look at a piece of 'slow' code from some "performance experts", doing a non-conventional filtering job on a dataset of a GB or so. They'd clearly never heard of "loop unrolling" (and for reasons not relevant here, the compiler wouldn't have been able to see it either).
For getting the ultimate out of any given hardware when doing matrix algebra (or filtering or...), fifteen years or so ago an answer might have been KAP - Kuck and Associates Preprocessor, which included all kinds of clever techniques going above and beyond what compilers of the day were able to use, to optimise the performance of particular source codes on particular hardware. Marvellous stuff.
Intel bought Kuck and Associates, and in due course KAP vanished from the market.
(How long before the same happens to VxWorks and Simics, also purchased by Intel? Will anyone notice?)
"Intel bought Kuck and Associates, and in due course KAP vanished from the market."
Presumably to incorporate it into their compilers.
You are aware Intel supply compilers, right?
> I suspect that in many cases the techniques involved have been lost.
Don't be dumb. These are well documented techniques in dozens of books on amazon et al. 'Lost' my arse. <http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Optimize-Options.html> and look for "strip mining" for a start. Or here <http://en.wikipedia.org/wiki/Compiler_optimisation> for a whole Dan Brown temple-worth of lost techniques preserved in the web equivalent of the vatican archives, known to a sacred few who can google. FFS good thing you posted anon.
Your "performance experts" clearly aren't if you know more than them, and wow, a whole gig of data - my works db is approaching 100X that and others here will manage much larger.
> But at some point we'll see a real OS for high performance computing, and the kernel, etc., will be really small
Aah, you mean a Transputer, with the OS kernel in hardware (microcode actually)?.
"You are aware Intel supply compilers, right?"
That's correct, I am indeed aware that Intel supply compilers, and associated tools, and now other readers are too. Some of this software they wrote themselves, some they acquired from Compaq (who acquired them with DEC). For x86, and to a lesser extent for something irrelevant called IA64, and other stuff (VxWorks, Simics) were formerly independent of Intel.
KAP also used to be available for non-Intel architectures.
It isn't any more.
VxWorks (RTOS) and Simics (chip+system simulator) were and are available for non-x86 architectures. Let's hope it stays that way now Intel own them.
I was working on ECL late 80's - power hungry but around 100 times faster than the CMOS available. I did some simulations of a cpu based on a 600 transistor CPU by MJ Shute. IIRC it was had more bang per watt than the available cmos options at the time - the ram would have been problem though!
An earlier post suggests software needs to catch up.
I think we need to be more flexible and efficient. My laptop and the 4 other machines in my house are utilised at around 2% of available cpu and thats running the occasional network wide rendering or collapsing star simulation - if I get a successful one I dont know if my use will increase or decrease.
I'm expecting a parallella board some time but I have more than enough umph to run the offices and workstations of most of the small companies I've worked for and still have some spare.
The one thing that would every office I've worked in by two or so orders of magnitude is open data standards. Just being able to read in someone's address and invoice structure correctly 99.9% of the time and bill them similarly would reduce PC and dull office positions at a stroke.
Could do it on 286 machines with EDI and kermit on 2400 baud modems, The same operations today take up whole departments printing out PDFs and Word documents to type in ....
"printing out PDFs and Word documents to type in ...."
One of the few things I do that reliably starts the fan running on my 5 year old laptop is OCR'ing a PDF (usually of an ancient manual - remember manuals?). With a few hundred pages to process, it will get there but sometimes I worry about the CPU temperature.
Other than that it's mostly idle most of the time.
The only thing that gets my humble dual core celeron in a sweat is actually moving images. I can swerve the CPU usage right up by grabbing a window and stirring it round the screen..and as for full screen videos..well..especially flash ones. YUK.
But that is of course something a co processor can handle so much better.
If circuit densities don't increase much from now on, there are still massive gains to be made in cheapening manufacturing processes, and substituting readily available materials for expensive ones etc. As others have said software can be radically improved and de-bloated (I'm writing this on a blisteringly fast Lubuntu desktop running on a multicore CPU supporting multiple virtual OSs at 97% hardware speed ).
The major change will be in adoption - when 80% as opposed to 2% of the population finish education knowing how to program; people who won't tolerate illiteracy and incontinence concerning the handling of data in their workplaces.
Some of the really interesting scientific developments in the next 50 years are likely to be in biotech and nanotech anyway. Figuring out how to make solar panels as a really cheap and durable add-on surface coating to everything exposed to outside light such as tiles and building cladding materials for example. Figuring out how to grow oil from algae in desert areas under bioplastic polytunnels with closed-circuit (i.e. zero system water loss) irrigation. It's unlikely operating these systems will need faster computers, and if designing them does, putting more of the faster computers being made into the cloud enables these clusters to serve more uses than application dedicated machines.
Surely giving us, say, 1024 CPU cores would increase performance quite a bit.
Ever heard of a gpu?
That's basically how tech like CUDA works on nVidia / similar devices ... i did also see (on here) that nVidia are working their way in to the HPC arena through a partnership with Dell.
Software efficiency is definately the more urgent way to go though, as a software developer myself I find that whilst my code could be really efficient it spends most it's time wating for the platform it sits on to do stuff using a fraction of the cpu and ram in the gaps between all that bloat.
It's time for a standard platform / framework that simply works instead of abstracting away the problems it causes.
My thinking being that 5th gen languages will open the door for programmers to simply "talk to the cmputer" but the real work needs to happen in lower level optimization beneath the surface.
Developers generally do what they're asked to do.
Given the usual project variables of time, cost, resource and quality, take a guess at what is usually bottom of the average corporate priority list? (Actual priority not stated priority btw)
At the moment, hardware is generally good enough to accommodate mediocre quality software for most applications. If that changes then commercial pressures will shift towards better quality code.
Currently, there's very little recognition or reward for elegant and efficient code in most places.
Also, leave off the software engineers. It's a discipline that's what, fifty years old? It's also in some cases the most complex engineering task there is.
It's the old saw : "I can do quality, quick or cheap. Choose two".
It's always the last two that are chosen.
Now that CMOS, the baby that no one wanted, is all grown up and ready to retire maybe it is time to take one of the other babies out or the deep freeze and grow them for a while. Sure ECL is power hungry, and MOSFETs are touchy. But these types of problems are perfect for engineers to make a career out of. I would happily take a computer that was twice the size, and used twice the power but was ten times as fast. Time to dust off some of that old tech that most of us sharpened our claws on and give it the life it deserved.
"I would happily take a computer that was twice the size, and used twice the power but was ten times as fast."
I think they could probably whip up one that was 4 times the size, used 20 times the power and was 10 times a fast and 100 times more expensive using what they've already got if you have the money. Just make sure you keep the application's working set small - say <64Kwords of Data & <64Kwords of Instructions - so you don't get hit by memory access penalities too often... You may get away with making the words very wide - but making good use of that in software is tricky. That's kinda where GPUs are good at anyway. :)
ECL would be slower simply because you can't achieve the density - and you pay latency penalties as a result.
My guess is people will achieve some amazing results with alternative technologies, but I believe these guys when they tell me CMOS is nigh on tapped out (as far as *cost effective* chips go - specialist apps may well tolerate higher costs of course).
... it removes the pressure to risk one's company on attempting to bring a different technology to maturity.
"The early adopters may not be the ones to succeed financially," he said. "The early adopters may be the ones that do the trailblazing and die."
Given the inexorable drive to move the entire industry onto systems based on commodity components, in the near future he's going to be damned lucky to find any company who is willing to spend the money required to bring any next generation tech to market.
Assume we are at a CMOS plateau. Over the last decade the metric has shifted from MIPS/$ to MIPS/Watt, and newer computers are dramatically more efficient. But this means we can get more MIPS/liter at an acceptable power density. Sure, we may need to get more innovative with cooling architecture, but engineers know how to do this.
But why? Well, because cramming more circuitry into a smaller volume reduces the interconnect length, and this reduces latency. If I can reduce a roomful of computers to a single rack, my longest runs go from 20 meters to two meters and latency goes from 100ns down to 10ns. (Speed of light in fiber is 20cm/ns.)
Today's devices are almost all single layer: one side of one die. A wafer is patterned and sliced into dice, and each die is encapsulated in plastic, with leads. This is primarily because they are air cooled, so they must be far apart in the third dimension. But it's physically possible to stack tens or hundreds of layers if you can figure out how to remove all the heat.
Look inside a modern enterprise server: what do you see?
Printed circuit boards surrounded by air.
But power supplies, disks, and fans do not need low-latency connections, and the only reason for all that empty airspace is to provide cooling. if we remote the disks and power supplies, and use a coolihg system that is more efficient that brute-force air cooling, we can increase the computing density by several orders of magnitude. My guess is that we can easily achieve a factor of 1000 improvement even if there is no additional improvement due to "Moore's law" in its traditional sense. If we do get this factor of 1000, that is equivalent to 20 years of Moore's law.
In this scenario, a computer is a dense collection of computing elements surrounded by cooling elements, power supply elements, and storage elements. Cooling is almost certainly provided by a transfer fluid such as water or freon.
And there you have PERCS. If you look at an IBM 9125-F2C (Power7 775 cluster), they are very dense, are water cooled (CPU, I/O Hub, memory and power components) with integrated electro/optical network interconnects eliminating external switches, and storage moved into very dense arrays of disks in separate racks.
When where I work moved from Power 6 575 clusters (which were themselves quite dense), they kept to approximately the same power budget, increased the compute power by between three and five times, doubled the disk capacity, all in about one third of the floor footprint of the older systems. And to cap it all, they actually cool the ambient temperature of the machine room.
But these systems proved to be too expensive for most customers, and IBM was a bit ambitious about the delivery timetable. Take this with a contraction in the finances of many companies, and IBM failed to sell enough of them to keep them in volume production. But they are very impressive pieces of hardware.
Replacing them with a 'next' generation of machines is going to be hard.