I would go one further than that and say that activation of a fault recovery
system presumes the failure of the “smart” primary system. If the smart primary
system could not predict or mitigate the error you are presented with, it’s
likely a second smart system would fail at it as well.
The ideal fault recovery system is one that can bring you immediately to a
survivable, but mission sub-optimal, state with as little reference to the
systems that failed as possible. As dumb as possible is good here, as the goals
of a fault recovery system are radically simpler than those of a primary system.
On Feb 8, 2018, at 10:28 AM, Henry Spencer <hspencer@xxxxxxxxxxxxx> wrote:
Whether it's possible and whether it's desirable are two separate issues. :-)
There is usually a strong preference for making emergency equipment, like
escape systems, *simple*. That's partly to make them reliable, partly to
minimize the possibility of subtle bugs (since they don't get tested much in
normal operation), and partly to make them less dependent on case analyses
that never manage to cover 100% of the cases.
Smart systems are a two-edged sword. Rockets and spacecraft have died
because smart fault recovery was faced with a weird unexpected situation, and
picked the wrong response. Stupid fault recovery that always does one thing,
rather than trying to make clever choices, is usually preferable.