back to article Place your bets: How long will 1TFLOPS HPE box last in space without proper rad hardening

SpaceX and HPE will put a modest little supercomputer into space next week to test how computer systems operate in extreme conditions. On Monday, August 14, HPE’s Spaceborne Computer will blast off to the International Space Station aboard a SpaceX CRS-12 rocket. It’s part of an experiment to examine if commercial off-the- …

FAIL

The predecessor to HaL 9000?

I'm sorry Dave but no HaL 9000 has ever made a mistake. We have a perfect record...bzzt...feh....zap...

12
0
Anonymous Coward

Lucky it wasn't a Microsoft Surface...

According to ConsumerReports, it wouldn't survive the takeoff let alone a year in Space.

25
3
LDS
Silver badge
Joke

Re: Lucky it wasn't a Microsoft Surface...

That's why the are called Surface - Earth surface only use...

27
0
Silver badge

Not quite how I'd have gone about things, but I assume the experts know what they are doing.

I'd have sent a box packed with many different sorts of chips running software to detect and quantify errors, with the aim of figuring out exactly how much ECC and self-validating software it takes to make sure a computer can operate reliably even with the radiation. Perhaps eventually involving having two less-than-reliable conventional processors operating in lock-step, with a third rad-hardened, very minimal chip constantly comparing them and initiating a reset every time they do something different. It'd probably weigh less than packing everything behind a big sandwich of plastic, lead and boron.

4
2
TRT
Silver badge

I'm sure they've already done that. This is to see how as close as practicable an off-the-shelf computer fares, knowing already how the chips themselves do.

11
0
Anonymous Coward

Lockstep RIP

This isn't rocket science, even if it's being used in a rocket.

The lockstep+comparator idea doesn't really work for modern processors where cache plays a significant part in performance. E.g. there will be correctable cache errors ('soft' errors) from time to time, but they can't be guaranteed to be the same place and same time on each member of the set.

If the comparators are simple, e.g. relying on instruction-level and/or memory-access level comparison, a soft error that occurs on only one member of the set will cause different behaviour from the ones without the soft error, and therefore (in your proposal) cause a reset.

In the real world, fault tolerant systems such as Tandem NonStop (now of course HPE NonStop) nowadays use cache-based commodity processors, same as everyone else. For error detection they use rather specialised comparators that do the comparisons on the eventual IO accesses caused by the instructions. If the IO accesses differ, something went wrong.

There's more to it than that, but more than I have time to describe right now.

9
0
Gold badge

It is also possible to irradiate things down here on Earth, and I'm sure they've done that too. This exercise sounds more like the "all up" test that proves something you are already pretty damn sure of.

19
1
Anonymous Coward

Lockstep on conventional processors doesn't work on Earth anymore, which is why is was dropped by the fault tolerant computing HPE NonStop team back during the transition between MIPS and Itanium and Itanium. Modern processors just do too much internal error correction and have too much pulled into the SoC to give a boundary that can be checkpointed and subject to a majority vote. Itanium featured radiation hardened latches in its pipeline and a lockstep mode, but at the cost of running at a fraction of the speed of normal mode and you're paying for at least two if not three processors.

The NonStop team started with an Itanium design that moved the checkpoint voting to the memory subsystem with a hardware checker and memory replication between boxes over optical links, but eventually figured out how to do the whole thing in software, which in turn has allowed them to transition to standard x86 blades.

5
0

Re: Lockstep RIP

VMware does this too. In fact, running everything in VM where it can be snapshotted, rolled back, etc probably makes a lot of sense for these missions...

0
3
Silver badge

Re: Lockstep RIP

What to do when radiation bit flips the hypervisor? Obvious: run three hypervisors in virtual machines and compare their output...

7
0
Silver badge

"Perhaps eventually involving having two less-than-reliable conventional processors operating in lock-step"

Can't vote with two systems, you'd never know which had an error because you wouldn't know the right answer. Use three and the two matching answers can be used. It's different to a normal cluster where you're only detecting failure since here you're also detecting subtle errors.

6
1

"It'd probably weigh less than packing everything behind a big sandwich of plastic, lead and boron."

Am I the only one who initially read that last word as "bacon"?

2
0
TRT
Silver badge

For what the adverts tell us...

off-the-shelf iPhones and American wrestlers are already being space tested.

2
0
Silver badge

Re: For what the adverts tell us...

Where do I get an off-the-shelf American wrestler?

Asking for a friend...

4
0
Bronze badge

Sol help us

Solar activity is pretty low right now so things might be just fine. Not rad-hardening electronics is a road SpaceX has been down a couple of times. They ruined some returned experiments when the fridge on board a Dragon capsule got zapped and there was at least one other incident that I can't recall the details of off hand.

Been there, flown that, won the NASA prize.

3
2
Anonymous Coward

Re: Sol help us

Why does a fridge need a microprocessor?

8
1
Silver badge

well

I assume they weren't just using it to store yesterday's leftovers and a two-year old jar of Branston pickle that "still should be good - it's been kept in the fridge all the time".

9
0
Silver badge

Re: Sol help us

No, the fridge got ruined when seawater got into it on landing.

6
0
Silver badge

Re: Sol help us

Simple thermostat without a microprocessor: Temperature sensor, power MOSFET, comparator circuit, trim pot to set temperature, two resistors (inc. hysteresis)

Simple thermostat with a microprocessor: Temperature sensor, power MOSFET, PIC microcontroller. Plus it can do PWM with PID feedback, and soft start.

It's common in electronics to use microcontrollers for absolutely everything now because they are of near-negligable cost and can usually do the task of several more basic components.

10
0
Bronze badge

Re: Sol help us

"Why does a fridge need a microprocessor?"

Cost, it's usually cheaper to put a micro + code, than design/test an analogue circuit.

With the Micro usually working out cheaper component wise.

Plus they can potentially add features with a firmware update.

1
0
Silver badge

What about computers at rest?

Now, yes, you're going to need some well-built stuff to use while in transit between Earth and Mars, but is this also true for computers at rest, powered down, and packed up? Can ionizing radiation have deliterious effects for data or even hardware that isn't operating yet but will be? I would think this to be an interesting question as well as most of the computing power one would take to Mars wouldn't be in use during the trip, only once one arrives.

4
0
Silver badge

Re: What about computers at rest?

It takes a lot more energy to damage electronics that are powered off. If a powered off computer is permanently damaged on the trip to Mars, your astronauts are probably dead. I'd worry more about NAND, since it needs to preserve state, but error correction would presumably handle it. Probably you're going to mirror everything anyway, so that should account for the (perhaps unlikely?) case where a single energetic particle is traveling at just the right angle to upset more bits in the same word than ECC can correct.

3
0
Silver badge

Re: What about computers at rest?

Surely the problem is that a high energy particle is going to create a shower of particles as it makes its way through matter so bits in the same block are very likely to be affected, hence bit striping (single bit wide memory in parallel).

1
0

Test about exposure to radiation in space

Doesn't the Earth's magnetic field provide quite a lot of shielding against space-borne radiation in low earth orbit? Isn't the real problem working out the survivability of electronics in deep space, where there truly isn't any protection?

7
0
Gold badge

Re: Test about exposure to radiation in space

Yes, otherwise the ISS astronauts would be unlikely to live too long. But this is space research. So you do everything with baby steps.

2
0
Bronze badge
Coffee/keyboard

Not a good test

Radiation inside the Van Allen belts is very low; Except for solar flares it's a non-issue. About 99% of the total solar radiation is deflected by Earth's magnetic field. For Mars, it's another story -- it weighs in at nearly .7 sievert per week. For comparison, the ISS receives about 150 mSv **per year**. It's not a valid test because the environment isn't anything like it would be out there. Regular PCs are already on the ISS, with no real ill effect other than a few extra reboots here and there.

4
0
Silver badge

Re: Not a good test

Yet another reason I don't think a manned mars mission is going to happen.

I'd like it to happen. I think it should happen. I wish it would happen. But in the end, it won't - because some group national leaders is eventually going to have to look at the bill and realise that's a hell of a lot of money even by government standards. Especially as the public is going to insist upon bringing the astronauts back again afterwards.

3
3
Gold badge
Unhappy

"Except for solar flares it's a non-issue. "

3 little words.

South Atlantic Anomaly.*

*Thanks to Henry Spencer for that

2
0
Silver badge

Re: Not a good test

Yet another reason I don't think a manned mars mission is going to happen.

I think it'll happen, but not for several decades yet, and not at anything like the scale those suicidal would-be colonist idiots imagine.

Since any interplanetary vessel is going to have to have a storm shelter for its crew (which most spaceship designers envisage being inside the ship's water tank), it makes sense for that to also be the location for the core computing systems (the water will be a handy coolant too).

And who among us hasn't ever dived into the server room to escape unwelcome visitors?

4
0
Silver badge
Windows

Re: Not a good test

@ Rich 11

If your server room is at the bottom of the pool, I'm coming to work with you.

@ElReg, need a scuba icon!

1
0
Bronze badge

Re: Not a good test

I'm not sure where you are getting your figures from. On the surface of Mars you are protected by the atmosphere, and Mars itself, so the radiation is similar to that on the ISS. You would also build a shelter from regolith, or situate it in a lava tube, which would give excellent shielding when you didn't need to be working on the surface or during solar events.

The journey there and back are another matter, but hopefully the trip will be under 3 months and the ship itself will provide some shielding. Overall we're talking risk of death by cancer increased by a few percent. If that bothers you, don't go.

2
1
Gold badge
Unhappy

"I'm not sure where you are getting your figures from. "

Umm. Mars has no magnetic field and an atmospheric pressure about 1/160 that of Earth. The ISS "storm shelter" is about 0.5% of the Earths atmosphere equivalent.

To get the equivalent protection of the Earths atmosphere at ground level on Mars takes a layer of regolith about 3m thick.

As for where I got my information this guy, who should be quite well informed on the subject.

0
0
Bronze badge

Re: "I'm not sure where you are getting your figures from. "

I'm not going to watch a video. Curiosity has a device to measure radiation. It gives 0.67 mSv per day, which about double the ISS exposure rate. Your figure of .7 Sv per week, or 100 mSv per day, is out by several orders of magnitude. Maybe you have your units wrong, or are confusing exposure during the journey with exposure once arrived. (If you missed a "milli" and confused weekly with daily, that would do it.)

http://www.sci-news.com/space/science-mars-radiation-measurements-surface-01629.html

0
0

It's really Cosmic, Ray.

On Earth we routinely simulate much of the space environment with one massively significant exception: Cosmic Rays, relativistic particles with extreme theory-breaking energies and unknown origin. We have some reasonable approximations that are a PITA to use at all, and impossible to use on whole systems, as they require de-lidding chips and exposing the naked silicon to heavy ion beams.

Cosmic Rays don't care about the van Allen Belts or Earth's magnetic field. But, thankfully, they are filtered quite nicely by Earth's atmosphere, converting into cascades of other relativistic particles that include muons and pions. These particles themselves have vanishingly short lifetimes when observed in the Lab, yet when coming from a Cosmic Ray cascade, they manage to live long enough to reach the Earth's surface, all due to their startlingly high relativistic speeds.

Cosmic Rays are the The Hulk of radiation, and since we have no clue how to make them on Earth, if you want to expose your equipment to Cosmic Rays, you need to send that equipment above the Earth's atmosphere.

And not far above it either! LEO does just fine.

11
0
Silver badge

Re: It's really Cosmic, Ray.

And for anyone who is a fan of big numbers: https://en.wikipedia.org/wiki/Oh-My-God_particle.

0
0
Bronze badge

They did this is the 70's

The stuff would not even operate correctly up a mountain.

main issue is soft errors and corruption due to Alpha particles striking the silicon, then there was the bad batch of ceramics used to house chips, where the ceramic was giving of particles.

3
0

A single machine here on earth is the control group? So if that machine has a problem that's a zillion space dollars wasted? I hope they have a couple of machines at least, preferably isolated.

1
1
Bronze badge

@ Codysydney

But that applies to the flown hardware too - only little extra bump on take-off and the whole machine fails...

0
0
Gold badge
IT Angle

A few notes on chips and radiation.

That's probably more computing power than the entire processing power of all the GNC systems of all LV's to date. The processors on Apollo were pocket calculator power IE 32KIPS, Shuttle GPC's started at 0.4MIPS and upgraded to 1MIPS each. The ISS runs (IIRC) 40MHz 386s. The bigger Mars rovers run Power PC's at around 200MIPS (and $100K a board, hence the interest in OTS processing).

As for radiation RAM started using on chip ECC because of radioisotopes in the packaging material decades ago. They don't report statics because a)It would tie up valuable pins and b)Who cares as long as the state read out is the same as the state read in.

Servers should have ECC for ram as standard, and logging processes as standard for SNMP (obviously the packet delay will be a bit of an issue).

Likewise "spinning rust" is AFAIK a lot more rad hard but it induces motion in the structure, unless you have pairs of contra-rotating disks to cancel those forces out. Sounds crazy but despite its size the ISS is not actually attached to anything

Obviously HPE are hoping a good result ouf of this will make them the goto supplier for HPC systems but getting hardware NASA certified and you can bet it will have to be NASA certified if any kind of software is running that's mission critical and the mission is NASA funded.

IOW upgrading to new processors is usually a massive PITA, which is why space runs with hardware generations behind the SoA in processing power. SX accepts the systems will reset and is OK with that, but getting that accepted by NASA for ISS docking must have been a nightmare.

3
0
Silver badge

Re: A few notes on chips and radiation.

What about using systems like these to process data on satellites or spaceships which could mean only the processed and probably much smaller data would need to be transmitted.

0
0
Silver badge

What they currently use on the ISS:

https://www.quora.com/What-are-computers-used-for-on-the-ISS

So there's three main US computers - of which one is considered Primary, one Backup and one Standby at any one time - and three main Russian computers which work simultaneously. These are accessed using laptops, seven US, seven Russian, running Linux. These systems govern the stuff you really don't want to go wrong.

Less critical stuff - inventory control, note taking, on board experiments, email etc - is handled by some Windows laptops, mainly Thinkpads as can be seen from photographs from onboard the ISS.

3
0

Re: What they currently use on the ISS:

Exactly, the ISS is full of laptops (something like 60) so what does this 'test' do that can't already be done on the ISS laptops (and probably has been time and time again). Smacks more of PR as experimentation.

4
1

Re: What they currently use on the ISS:

You're right, the ISS is full of laptops.. Lenovo Thinkpads mainly. However these are FAR from considered "off the shelf". In one way or another, specialised Thinkpads have been flying to space since 1993 aboard STS-61. They're quite significantly modified however to meet stringent NASA requirements.

Here's an interesting story from a few years ago, posted to nasaspaceflight.com by one of the IBM project managers responsible for initially putting the Thinkpads on the shuttle.

https://forum.nasaspaceflight.com/index.php?topic=27043.0

TL;DR - The laptops on the ISS aren't "off the shelf" at all.

1
0
Silver badge

Re: What they currently use on the ISS:

That's a very interesting link you've posted, thank you. However, I didn't spot any mention of the Thinkpads being modified, bar for a different power supply on on model. They were, especially the earlier ones, subjected to a little of testing.

0
0
Silver badge

Re: What they currently use on the ISS:

The IBM project manager 'Jim' in the above link specifically states that the Thinkpads were off the shelf. The experiments that used them were designed to tolerate a reboot every so often.

0
0
Silver badge
Paris Hilton

Phew Phew!

it causes bits to randomly flip thus changing information and crashing programs.

AFAIK cosmic rays (i.e. light-hugging fat nuclei) can also cause the circuitry to trip or even blow a transistor here and there right up the epoxy.

1
0
Anonymous Coward

> “Future phases of this experiment will eventually involve sending other new technologies and advanced computing systems, like Memory-Driven Computing, to the ISS

Well at least they'll burn less fuel in future sending up marketing fluff rather than real servers

1
2

And there there are....

In counting up all the more or less off the shelf computers on the ISS, don't for the two Raspberry Pis that went up late last year.

3
0
E 2

Won't the electrons get confused and float away?

2
0
Silver badge
Happy

Yeah ...

I am sure the astrobuffins in space just want to take part in the Overwatch Summer Games ... well done!

1
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2017