Saturday, April 14, 2007

Digital Woes for Mars Global Surveyor

It appears that a single command to the Mars Global Surveyor was the root cause of it eventually failing. "Mars Global Surveyor Dies From Single Bad Command". If this is truly the case then this is a pretty scary event considering that a software engineer somewhere is responsible for the loss of a remote space craft.

I read a book a few years ago called "Digital Woes" written by a Reagan era 'Star Wars' computer scientist, Lauren Ruth Weiner, that cast a serious doubt on the probability of succsss of such a complex system. The probabilty of success based on the statistics of digital failures in the modern computer age was calculated to be too high. From what I recall, there was a lot of statistical and scientific models created where the success of the 'Star Wars' system was unachievable.

At any rate, the news today concerning the Mars Global Surveyor reminded me of the book and how software engineering is even more critical today. A single bad command caused a cascading set of failures which eventually shutdown the space craft. This just adds fuel to the argument that software testing is critical and that sound software engineering practices may not be enough.

Based on what I know about what happened to the Mars Global Surveyor, it looks like the NASA team followed all disciplined processes but the failure still occurred. It appears that the complexity of the system behavior was not well tested which resulted in the sequence of events leading to the eventual failure of the system. The LA Times article mentions budget cuts, political pressure and short time frames as contributing causes for the human error. The Scientific American article also mentions human error.

I wonder if they did any modeling and simulation or at least simulation of the scenario? Being an optimist, I think that test-driven software development is not enough with the complexity of modern systems and that simulation is required in order to really observe the behavior of the system. Given the declining NASA budget environment and fewer resources, maybe some new ways of approaching software engineering at NASA is an option. The 'do more with less' mentality requires new ways of thinking and a lot of engineering management innovation.

No comments: