[AR] Re: LEO radiation shielding

  • From: Henry Spencer <hspencer@xxxxxxxxxxxxx>
  • To: Arocket List <arocket@xxxxxxxxxxxxx>
  • Date: Sat, 21 Dec 2019 03:15:45 -0500 (EST)

On Fri, 6 Dec 2019, Uwe Klein wrote:

However, FPGAs don't get you away from "more design work equals more
opportunities for mistakes".  There's been at least one spacecraft lost
(the WIRE infrared-astronomy mission) because of FPGA design screwups.

I've been lucky in this respect, 5 designs flown, no burps. :-)

simulation tools are very usefull. Simulations tend to break on interaction with outside things. One aspect being simulation of the periphery to be shoddy or not done at all :-)

Another trouble spot is power-up/down behavior. That's what happened to WIRE. The FPGA-based controller that ran some of the hardware showed occasional power-up misbehavior that couldn't be reproduced. They eventually wrote it off as some weird interaction with the test gear. Big mistake. When they powered it up in orbit, it fired all the pyros instantly, and the mission was over.

What nobody had noticed, until the postmortem after the failure, was that the misbehavior was always in the first test of the day, after the hardware had been powered down overnight. Internal nodes inside FPGAs can hold charge for a surprisingly long time after the power has been turned off, and this can make a difference in how the system powers up again, especially if the designer hasn't paid enough attention to forcing clean reset on power-up. When they did *long* power-downs, the problem was reproducible.

A big thing is getting to grips with the design software. Those have bugs and indiosyncrasies too. Once you have established a working system and know the bugs and have worked out methods to go around those ... shoot anyone who tries to update the software ( this just In NEW!!!! <pavlov dripping> ) on the devel system.

Apart from knowing the development-software bugs, when something's wrong and you're trying to troubleshoot, it can make a big difference to be able to reproduce your earlier builds *exactly*, bit for bit, so you can start from known behavior and make only deliberate changes. This can mean you're eventually running embarrassingly old compilers and such; if they work, don't mess with them.

(The tricky part comes when an old compiler needs an old operating system which needs old hardware, and you can't buy that any more. Fortunately, virtual-machine systems let you fake old hardware...)

Henry

Other related posts: