Post a Comment
the important point is this:
"when the engineers noticed something was wrong"
as the stack from software to hardware becomes bigger and bigger one starts to wonder what kind of flaws may hide in there somewhere that can lead to calculations coming out wrong.
the only real way of telling is by taking the same input numbers and do the math manualy. but as the calculations become bigger and the timeframes become smaller we may well risk a flaw going unnoticed until it kills someone...
can we realy trust our new electronic overlords?
Edited 2005-11-10 19:40
i believe many of these big errors and in some cases, tragedies, could have been avoided with good unit testing. especially things like "The error is in the code that converts a 64-bit floating-point number to a 16-bit signed integer. The faster engines cause the 64-bit numbers to be larger in the Ariane 5 than in the Ariane 4, triggering an overflow condition that results in the flight computer crashing."
"What engineers didn't know was that both the 20 and the 25 were built upon an operating system that had been kludged together by a programmer with no formal training. Because of a subtle bug called a "race condition," a quick-fingered typist could accidentally configure the Therac-25 so the electron beam would fire in high-power mode but with the metal X-ray target out of position. At least five patients die; others are seriously injured." As for this one i'm not so sure about, since it deals with interactivity, even good tests could have missed. perhaps this a good case for auditing. someone should have been auditing and maintaining the OS code.
alot of these bugs were long ago, i don't know the history of software testing principles, but theres really no excuse at this point. every class down to simple calculators in mission critical systems like radiation control should have an accompanying unit test. furthermore they should be tested by trained dedicated software testers, not just the developers/programmers that wrote the software.






