Sir, we reconfigured the memory arrays
A space probe in orbit above Mars, crippled by a fault in its solid-state memory, has been brought back on line and is now once again handling scientific data. The Mars Express spacecraft, which has been orbiting the Red Planet since 2003, has been suffering problems since August in which it has repeatedly gone into "safe mode" …
Good job, chief.
"You canna change the laws of Physics."
Make it so, Number One.
I'm a doctor, not a BOFH
But fixing spacecraft rocks.
You go Interplanetary BOFHs.
How very convenient and perfectly timed for out of this world communications traffic in these oddest of future days.
* Pinging AIVD/MIVD re CyberIntelAIgent Security and Intelligent Information Delivery Systems fax/copy/missive from Schiphol Base 1st September 2011
If you would just humour this post, El Reg, and give it some breathing space on this thread, as it is beta testing some extremely sophisticated IP which has been gifted and is for gifting to any and all who are into SMART Future Virtual Reality Systems for Alternate Reality Gaming Systems of Operation, AI Command and Cyber Control.
You surely know you are right at the front of the queue for insider scoop information on rapidly unfolding and virtually classified developments, which put the likes of a Big Brother into the shade.
... but not as we know it.
Makes me feel a little less smug about being able to remote login to my home PC while at work, a whopping 6 miles away, I'll tell you that much.
So there solution is to swap it out to file storage? Pssh. Amateurs. Have they not heard of the BadRAM patches?
I confess to being somewhat surprised that there wasn't already provision to page out bad memory.
The SSMM is effectively the "hard disk" for the spacecraft. File storage is the problem, not bad RAM per-se.
Maybe they should just upgrade their firmware...
Is this thing running on Windows? That explains a lot.
Had to be said.
Your joke made me smile, but it's not quite Windows.
Windows would display a blue screen and stop talking to anyone or doing anything until someone went over there and "turned it off and on again".
No, this safe mode is more like "Oh fuck what the hell just happened!? Turn off everything that isn't essential (like Sky TV transmissions), and make sure that the antenna for reciving commands is pointing in the right direction so we can receive commands to do stuff, and turn the solar panels to get the maximum power in case things go pear-shapped."
I always wondered what would happen if an unlucky energized particle happened to hit an atom inside one of their storage devices. I'm just happy to have our wonderful magnetic field shield in place.
Thank you again Earth's spinning iron core, for making our IT jobs a lot easier (and keeping us alive too).
"Studies by IBM in the 1990s suggest that computers typically experience about one cosmic-ray-induced error per 256 megabytes of RAM per month." - from Wikipedia's Cosmic Rays entry.
Hence ECC RAM. The Cell has ECC on its internal 256K-per-SPU memory, which surprised me at the time but I guess that's the way things have to go. No doubt other modern CPUs have the same thing on their caches.
On a probe, there would be many more cosmic ray events, so I suspect they've designed for it too. Probably using a combination of shielding and hardware ECC.
In my project, every bit is stored 3 times, and then the 3 stored values are compared. An energetic particle will only be able to change the state of one of the 3, so majority voting sorts it out.
In more critical systems, the 3 bits also have a delay after them that is different for each of the 3, so a transient in the power that afects all of them actually happens at a different time when the outputs are compared, so again it is cancelled out.
And in one other system, the value of the bits is stored on a capacitance sooo big that even multiple strikes on any part of the circuit can't change the value stored (although that means re-programming values does not happen in ns!)
In terms of here on Earth, the atmosphere makes the biggest difference to the type of particle that may cause problems, and hence high-flying aircraft suffer more than RAM on the ground.
"Fred Jansen, the Mars Express mission manager, said the spacecraft has recovered from its last safe mode event and successully completed initial testing of the workaround, which involves a new way of storing commands aboard the probe before they are executed.
Instead of using a special file in the solid-state mass memory unit, the commands would be housed in a hardware-based timeline store outside the memory system, bypassing the issue believed to be the cause of the safe modes.
Jansen said the Mars Express radar sounding instrument, named MARSIS, conducted test observations Monday with no problems."
So when it's in safe-mode, it needs to align the mirrors to keep the solar charging. And when it's fully up and running it powers itself... how?
"So when it's in safe-mode, it needs to align the mirrors to keep the solar charging. And when it's fully up and running it powers itself... how?"
With a charged battery? I guess the normal operation orients the spacecraft so that its instruments point towards Mars, and the solar panels receive less light occasionally. That is fine, it will run on batteries during those periods. In safe mode they don't want the risk of the spacecraft doing something silly that would eliminate the charging periods (like starting to point the panels to the opposite direction of the Sun) and then die for lack of power. Probably they also want ensure there is maximum power available for recovery attempts.
I would imagine in normal operations the craft is orientated with sensors towards the planet, like the moon or a coms sat for instance, so the solar array would still receive sunlight for periods of the day but would be at acute angles for most of the subside orbit. Safe mode I would guess sets the satalite in a tumble* so that the panels are always facing the sun while the instruments peer into space, planet, space, planet and so on.
*Technically, in normal operations the satellite revolves around its axes, to keep the planet in view as it free falls round it, while in safe mode it stops revolving, which to an observer would look like its tumbling.
But at least SSDs are faster than hard disks, right?
Seriously though - nice work guys.
Does this remind anyone of the Mars pathfinder issue when they have some debug code in the pathfinder which sent stack traces of the panic or something like that and a fix was uploaded ?
How did you resist the urge to call them BOFH-ins?
Or whatever you favourite refreshment / stimulant is.
I love hearing about epic spacecraft-recovery wins!
Yay for boffins!
Still beats bronze-age beliefs in the hand....
I think I will use this as an example of the awesome things computer science brings us when we have our next open day for potential students on coming Friday (apart from a host of other, more mundane examples)
It is boring, but not easy.
More on this, please, El. Reg!
If you have a severe failure, a day or so of data that could be lost, and 150 workers that cannot work, well, this is NOT so boring.
I believe that being a BOFH is like being a passenger airline pilot. You get months of boring work and then some really terrifying minutes (or hours) now and then.
...would be able to do stuff like that in safe mode.
You know, things like detection of something being borked and enable itself to INPUT COMMANDS and such, and not just stop there BSOD'ying.
Without hitting ctrl-al-del first, or whatever.
... even if he doesn't have a PhD.
Maybe Howard's been allowing a "future Mrs Wolowitz" to drive the damn thing again!