" Why don't SGI manufacture a big bad SMP server with 16-32 cpus instead of clusters with 1000s of cores or cpus? "
Because the technology SGI has was simply overkill; as they say in the article, they were marketing to "cloud computing" providers (which of course means once the marketing guys got out of the way they were marketing to data centers in general...) But having a multi-giga*byte*/second interconnect and so on is simply overkill when compared to (less costly) rack of systems with ethernet backing them, and SGI was unable to get costs low enough to be even vaguely competitive. Realize, virtual machines, web server software, database software, most data analysis stuff, etc., each individual process easily will fit in the amount of RAM that can fit on one blade, so the high-speed interconnect will not even come into play.
"Until SGI sells a big bad SMP server with as many as 32 cpus, I won't be impressed. Anyone can make a huge cluster with 1000s of cores and tons of RAM. "
This has been cutting *some* into SGIs sales -- throwing a pile of systems into a rack and sticking gigabit, or 10gigabit, ethernet, or Myrinet, or Infiniband, behind them, allows for a system that can run certain distributed jobs just fine and some other types "well enough". This is not what SGI has though -- they have a VERY fast, VERY low latency interconnect that permits the whole lot to act as a single system with 100s or 1000s of CPUs in it with minimal loss of performance.
"Until anyone offers a Linux server with 32 cpus for sale, I wont be impressed. "
Be impressed, 32-CPU systems shipped years ago, I've seen a few personally. IBM sells a system with 8 6-core CPUs which came out in 2010, Sun had a 4 x 8-core system already by 2010. You could buy 12-core CPUs from AMD by 2010, and run 48 cores using a 4-socket motherboard.
"The problem is that Linux can not scale beyond 8 cpus today."
It sure can.
" No one has ever offered a 16 or 32 cpu Linux server for sale."
Don't know about CPUs, but cores? They sure have.
" SGI could be the first to make Linux scale beyond 8 sockets? And then SGI could target the Enterprise market, instead of chasing HPC scientific number crunching specialized companies, running large Linux clusters."
Ignoring the first part (SGI is far from the first to scale beyond 8 sockets...), SGI tried targetting the enterprise market. As the article says, it has low margins compared to HPC. SGI's technology was overkill for this market, and so they could not get the cost low enough to be competitive.
The thing you are missing (other than insisting high-core systems don't exist, and that Linux can't use them....) is the great speed of this interconnect. ScaleMP lets you lash together a bunch of nodes and have them appear as a Single System Image; SGI's specialized chips are letting them lash together a bunch of nodes and have them appear as a Single System Image. BUT, ScaleMP is limited by the (gigabit or 10 gigabit) ethernet, or Myrinet, or Infiniband, or whatever backing it up, which at it's best has an order of magnitude higher latency than NUMALink.
NUMALink is quite close to being fast enough to actually supply the CPUs at full speed, so you can have a job spread all over and it'll still run at close to 90% efficiency. THIS is what is keeping SGI in business, there are certain computation types that break up just fine (nearly to fully independent threads, working on nearly to fully independent bits of data, will work fine on a cluster), and there are certain types that do not break up well at all (fluid dynamics and weather models to name 2.) With these types of computations, the work threads tend to each have to do somewhat of a "random walk" through the data set, and sometimes threads are pretty interdependent, both of which makes this type of job completely unsuitable for "traditional" (message passing) clusters, and very very slow on a "traditional" cluster running single system image software on it, due to the slowness (slow relative to on-board memory...) of the message-passing hardware.