
SGI?
It would have been nice if El Reg had enquired about what on earth* happened to SGI once HPE bought them and arguably more importantly if Cray will seem to disappear too...
*Pun intended
Though HPE's Spaceborne Computer is still fresh from its jaunt to the International Space Station, veep and CTO for HPC and AI Dr Eng Lim Goh is pondering a return visit and outfitting missions to Mars with the company's kit. The Register met Dr Goh after he'd been reassuring attendees at the Sibos 2019 event that AIs were …
I can understand the obvious importance of that, but for science it is imperative to able to consult the actual data, not just the result of the processing. So data will still have to be re-transmitted in some form or another.
It might however be feasible to use a backup SSD and bring the data back "manually", rotating a new backup with the next flight.
Some places have a data volume that's too high for this, for example CERN only keep a subset of their data after processing and the Square Kilometre Array will have even bigger data volumes. In my field we no longer keep the image files from the sequencers and instead regard the sequences as the raw data.
If I read this correctly, nine out of twenty SSDs failed, that's almost half of them in a little over one and a half year in low orbit... Doesn't bode too well for any lengthy mission outside earth's magnetic shield (thinking Voyager or Curiosity) if you lose (approximatively) 50% of your SSDs every 2 years*. And you can't over-provision too much either, space / weight are at a premium.
* And we're on the quieter part of the sun cycle right now. I wonder how their unshielded, unhardened computer would fare caught in a major solar proton event.
> these are regular enterprise-grade storage devices
Yes, but that was the whole point of the experiment, wasn't it. To use standard cheap commercially available equipment, and rely on error correction to fix the inevitable random bit flips.
Except no error correction can counter hardware dying. I think my concern stands, I don't see how this idea might work for long duration missions, or missions during/near solar maxima.
Change the layout so you place the most vulnerable bits very close to the center of the mass. Odds are the particles will hit something else, first...
It may also be possible to just periodically re-flash firmware on the SSDs, if bit-swaps in those chips are the source of the failures.
While NASA was trying out 100% commercial, off-the-shelf hardware, one component needing slight customization would likely still be a rousing success. Whether that's a weird custom case layout, or custom-built SSDs with duplicate ROM chips on their controller boards.
Or perhaps they'll just give NVDIMMs and other enterprise storage tech a try, now that they know storage is going to be the weakest link.