Amazon's Cluster Compute Instances officially sounded the death knell for grid computing efforts that once held promise as the "next big thing". Cluster Compute Instances takes a multiple of x64 and links them together using 10 Gigabit Ethernet interfaces and switches. The EC2 virtual server slices function just like any other …
"There are also many more deployment options when you are targeting clouds than your own data center.
With the exception of very specific privacy and security issues - which can arguably be addressed anyway - there are fewer and fewer reasons why any organization would want or need to run their own massive server farm."
I would argue that any company that had a serious ongoing need for HPC would rather target a cloud *in* their own data centre. Privacy and security issues cannot just be glossed over with "can arguably be addressed" at all. There are plenty of organisations out there that would potentially use HPC - let's take financial institutions for example - that cannot just squirt data around the globe to old-mate's cloud because regulators don't allow it. Users of HPC are also unlikely to want their IP floating around in someone else's cloud either. Companies like to be masters of their own destiny which is why they run their own server farms.
Then we have the practicality of all of this - a lot of HPC functions don't perform well in virtual environments. I can name Matlab as one shining example of something that works rapidly on bare-iron, has it's own inbuilt grid functionality, and runs like shit on virtualised hardware. As soon as you virtualise you add overhead and speed bumps and, much as vendors like to spout "typical slowdowns in the region of only 5-10%" I have witnessed intensive CPU<->Memory tasks (essentially what HPC is) suffer a slowdown such that a job will take almost twice as long on virtualised hardware. Virtualisation is great for many things, but HPC just isn't one of them - unless you're a company that cannot afford to run a server farm.
Do I sense a rather large P.O. in the future?
Are you implying that Amazon already have enough hardware to support the world's HPC needs?
Cloud computing is example of computer scientist developing a software solution to a human problem particularly with regard to HPC. Large Linux clusters are hard to use, and the idea of HPC users running on virtualised hardware is hard to image. HPC resources are very expensive and are often under-utilized especially in academia due to nothing more than lack of user training and support. A University IT department would never make a multi-million pound investment in hardware for mail servers or networking support to have it run at a 45% utilisation rate or less. But it happens all the time with HPC resources, these systems become "white elephants" or "GIANT space heaters" if you prefer.
So cloud computing with virtual hardware is going to solve the problem how? Without extensive
training and user support (ie people to help) clouds are not going to be any easier to use you'll just have service that has been outsourced and is no easier to use. With the added "benefit" of your application running slower as MARK 65 points out. So you'll spend "millions" on cloud computing to have your applications run much slower in the cloud rather than have a dedicated resource. So in simple terms your spending 1,000,000 to get 900,000 or less in value back??
Maybe - but then again - maybe not.
All predictions in HPC hardware develpment point towards massive concurrency in 2-5 years.
Wether this comes in the shape of GPGPU, 20 millions embedded processor cores in an n-dimensional torus, FPGA or any other devlopment remains to be seen.
In any case it will be a development which predictably will kill of 'HPC in the COTS cloud'.
Massive concurrency - possibly with an OS which runs a single thread per node -- is exactly what the Amazon SW/HW stack is not good at.
This holds true for even an ordinary off-the-mill ISV code. Running the same code on 1000 embedded processors instead of 100 x86 cores will save you a factor of ~10 in terms of power.
In 5 years time this will sum up to a factor of 5 or more in total cost savings.
So the expensive bits in 5 years time will be power, power, software development to adress the power issue (massive concurrency), and - power. Not many incentives for HPC/Cloud.
Grid is not HPC
Grid computing does not attempt to replace HPC - HPC is what it is, and has been established for a long time. Grid computing does not focus so much on high performance but high throughput. The issue for scientific Grid computing is not the processing power needed to run, but the size and location of the data sets that must be operated on. Cloud services from the likes of Amazon do provide great alternatives for various situations, but will not help a scientific research group with petabytes of data spread across multiple collaborating sites, because getting that data within range of the processing units is the hard part. Using Grid computing techniques, each site can run its own Grid on local hardware that is close to the large dataset, and results can be centrally collated. The type of site involved in this sort of project - e.g. a large University - have plenty of machines with spare CPU cycles available to perform the operations - Grid computing is an attempt to harness those spare cycles and put them to use.
Cloud? Grid? HPC?
Can you really compare the three? As I understand the terms, each of them is meant to address a very different problem:
1) Cloud: Lots of users with relatively small processing requirements per user and small amounts of data to transfer per user. A perfect example are applications running on top of a database to which each individual user cannot add more than a little bit of data. For example, search, email, etc.
2) Grid: A relatively small number of users whose computational needs can be broken up into separate tasks, each of which requires a very small amount of data to specify it's input and output, but a very large amount of processing time which is then spun off to other computers on the grid to allow exploiting the left over processing power of other individual users. The fact that there are very few users scheduling tasks is important because if you assume that half of the users are scheduling tasks, then all they get in average is another machine available to perform work. Folding @ Home is the perfect example.
3) HPC: A single task that is monstrously computationally intensive and most likely requiring a very large amount of data to be moved around. Think solving partial differential equations on huge grids.
The cloud does not have the bandwidth necessary to move around huge grids into it or from it.
The cloud is not left over computing capacity, but rather paid for capacity.
So exactly, how is it that the cloud can do what grid or hpc are meant to do?