[AR] Re: LEO radiation shielding

  • From: Henry Spencer <hspencer@xxxxxxxxxxxxx>
  • To: Arocket List <arocket@xxxxxxxxxxxxx>
  • Date: Sat, 21 Dec 2019 02:30:09 -0500 (EST)

(Catching up on some postings I set aside "for a bit later" when I got busy earlier...)

On Thu, 5 Dec 2019, Elliot Robert wrote:

Just to add more data points to the discussion. The first Mexican cubesat which launched today, aztec-sat is using a beagle black board for it's primary computer.  That's a decidedly terrestrial design. About $65 USD from digi-key.

That's not the first BBB in orbit, either, although it may be the first that's mission-critical. It'll be interesting to see how well that works.

I wonder if those of you with the experience could elaborate about the memory error correction software being flown routinely.

Proper memory error correction, alas, *really* has to be in hardware. You can sort of fake it a little in software for things like control variables, but there isn't a practical way to do software error correction on, e.g., the program's stack -- at least not without serious compiler modifications and a lot of overhead -- and a one-bit error there can make an awful mess. Been there, done that, not doing it again.

Hardware error correction is fairly easy to do if you're building your own board, but it is admittedly scarce in off-the-shelf boards. (With one caveat: some modern memory chips appear to have undocumented internal error correction to make less-than-perfect memory arrays usable. This may also help with errors due to proton hits etc., but it's hard to say how good it is when the manufacturer refuses to discuss it at all.)

The good news is, if you design the hardware so a software failure can't damage it -- e.g., minimal solar arrays on all sides so temporary loss of attitude control is not a mission killer -- then if you're willing to live with a somewhat higher rate of outages, you can cross your fingers and just ignore the issue. Whether you get a usable satellite that way is a bit of a gamble -- even proton-beam testing isn't really a good simulation of the space environment -- but sometimes you get lucky and the chips stay up fairly well. Witness all the off-the-shelf stuff, like BBBs, flown in space with some success. (I say "sometimes" and "some success" because you seldom see press releases about the times when it *didn't* work. A lot of cubesats either are DOA or work so poorly that their owners quickly give up on them; either way, such failures usually aren't publicized.)

...are we really just talking about multiple CPUs with their own short term memory that are constantly running basic arithmetic equations 1+2=3. If one of the cpu's results don't match the other two for a given instance of time...

Unfortunately, what you really want to know is whether the CPU's results in doing *real work* match the other two. Typically we're not talking about a CPU that gets a little tipsy and does everything not-quite-right, but about transient errors in memory or internal registers that mess up particular computations only. Catching that tends to require, again, comparisons done in hardware. That can get complicated and messy -- e.g., do the CPUs always take interrupts with exactly the same timing?

One bright spot: we're starting to see high-end microcontrollers for high-rel applications that have two lockstep CPU cores on the *same chip*, with interrupt handling etc. carefully synchronized by the hardware, and internal comparisons done on everything, and any disagreement causing the whole assembly to reset; those might be useful for cheap spacecraft. (This requires either (a) a spacecraft that can't be hurt by software outages, or (b) software like that in the Apollo LM, which *doesn't* just give up and do a cold start when a reset occurs, but rather makes an organized effort to pick up where it left off.)

Henry

Other related posts: