Surely you’re joking, NASA
A remark in Thomas Kida’s splendid book Don’t Believe Everything You Think (Prometheus) snagged my attention yesterday. Page 193:
However, overconfidence can also cause catastrophic results. Before the space shuttle Challenger exploded, NASA estimated the probability of a catastrophe to be one in one hundred thousand launches.
What?! thought I. They did!?! They can’t have! Can they? I was staggered at the idea, for many reasons. One, NASA is run by science types, it’s packed to the rafters with engineers, it couldn’t be so off. Two, I remember a lot of talk – after the explosion, to be sure – about the fact that everyone at NASA, emphatically including all astronauts, knows and has always known that the space shuttle is extremely risky. Three, the reasons the shuttle is extremely and obviously risky were also widely canvassed: a launch is a controlled explosion and the shuttle is sitting on top of tons of highly volatile fuel. Four, a mere drive in a car is a hell of a lot riskier than a one in one hundred thousand chance, so how could the shuttle possibly be less risky?
There was no footnote for that particular item, so I found Kida’s email address and asked him if he could remember where he found it. He couldn’t, but he very very kindly looked through his sources and found it: it’s in a book which in turn cites an article by Richard Feynman in Physics Today. I knew Feynman had written about the Challenger and NASA, but no details. The article is not online, but there is interesting stuff at Wikipedia – interesting, useful, and absolutely mind-boggling. They can have, they did. Just for one thing, my ‘One’ was wrong – NASA is apparently not run by science types, it’s run by run things types. Well silly me, thinking they’d want experts running it.
Feynman was requested to serve on the Presidential Rogers Commission which investigated the Challenger disaster of 1986. Feynman devoted the latter half of his book What Do You Care What Other People Think? to his experience on the Rogers Commission…Feynman’s account reveals a disconnect between NASA’s engineers and executives that was far more striking than he expected. His interviews of NASA’s high-ranking managers revealed startling misunderstandings of elementary concepts. In one example, early stress tests resulted in some of the booster rocket’s O-rings cracking a third of the way through. NASA managers recorded that this result demonstrated that the O-rings had a “safety factor” of 3, based on the 1/3 penetration of the crack. Feynman incredulously explains the gravity of this error: a “safety factor” refers to the practice of building an object to be capable of withstanding more force than it will ever conceivably be subjected to. To paraphrase Feynman’s example, if engineers built a bridge that could bear 3,000 pounds without any damage, even though it was never expected to bear more than 1,000 pounds in practice, the safety factor would be 3. If, however, a truck drove across the bridge and it cracked at all, the safety factor is now zero: the bridge is defective. Feynman was clearly disturbed by the fact that NASA management not only misunderstood this concept, but in fact inverted it by using a term denoting an extra level of safety to describe a part that was actually defective and unsafe.
Christ almighty.
Feynman continued to investigate the lack of communication between NASA’s management and its engineers and was struck by the management’s claim that the risk of catastrophic malfunction on the shuttle was 1 in 10^5; i.e., 1 in 100,000…Feynman was bothered not just by this sloppy science but by the fact that NASA claimed that the risk of catastrophic failure was “necessarily” 1 in 10^5. As the figure itself was beyond belief, Feynman questioned exactly what “necessarily” meant in this context – did it mean that the figure followed logically from other calculations, or did it reflect NASA management’s desire to make the numbers fit? Feynman…decided to poll the engineers themselves, asking them to write down an anonymous estimate of the odds of shuttle explosion. Feynman found that the bulk of the engineers’ estimates fell between 1 in 50 and 1 in 100. Not only did this confirm that NASA management had clearly failed to communicate with their own engineers, but the disparity engaged Feynman’s emotions…he was clearly upset that NASA presented its clearly fantastical figures as fact to convince a member of the laity, schoolteacher Christa McCauliffe, to join the crew.
That’s one of the most off the charts examples of wishful thinking in action I’ve ever seen.
According to Edward Tufte, the real culprit was Microsoft PowerPoint:
Did PowerPoint make the space shuttle crash? Could it doom another mission? Preposterous as this may sound, the ubiquitous Microsoft “presentation software” has twice been singled out for special criticism by task forces reviewing the space shuttle disaster.
See here: PowerPoint: Killer App.
Aha – that one is about Columbia. Challenger is different (the causes of failure are different). There was a very good, long article about NASA obtuseness about safety issues and Columbia in the Atlantic a year or two ago.
Oops — still, interesting approach to PowerPoint
giggle
I’ve never used it myself, but have heard a lot about people entangling selves in silly presentation bells and whistles to no good purpose.
You know that this Challenger stuff is often cited as an exemplary illustration of group think?
It was ever thus that non-expert administrators lorded over those who actually knew what they were doing. I see it as the result of the idea that leadership or management is a transferable skill, separate from the thing you actually manage. NASA is an example of how this is not the case.
You know that this Challenger stuff is often cited as an exemplary illustration of group think?
A point also made (I rhink) in Tufte’s essay PowerPoint Does Rocket Science. Includes great cartoon which I would like to hotlink but don’t want to screw things up at N&W so I’ll resist the temptation.
bells and whistles ..
or, as Tufte puts it:
“Sentences Are Smarter Than The Grunts of Bullet Lists.”
It also appears in “What do you care what other people think?” by Feynman. The second half of the book is about the inquiry into the Challenger disaster and the effort put into making the cover the right colour.
OB: “I’ve never used it myself…”
Possibly the most astonishing statement I’ve yet read on B&W!
“You know that this Challenger stuff is often cited as an exemplary illustration of group think?”
Yes, partly because we’ve had a couple of conversations about it – that’s how I knew Feynman had discussed it. I was fascinated by it at the time (in the wake of the Challenger explosion) but in my case as an example of management ignoring the expertise of the people they were managing – of ignoring it and thereby wasting it and screwing up. It was right after my stint as a zookeeper, at the end of which a new elephant exhibit was being planned; management and the architects pretended to consult us (the keepers) but really didn’t – the plans were already made, and they simply ignored what we said. They should have asked us what the elephants would need and then designed the exhibit – but no. We were all just flabbergasted at the total disconnect. So NASA’s ignoring and over-ruling of the Morton-Thiokol engineers’ worries about the O-rings sounded horribly familiar to me.
Challenger is also clearly an exemplary illustration of over-optimism and (Kida’s point) overconfidence. Groupthink and overconfidence getting together and creating synergy.
“The second half of the book is about the inquiry into the Challenger disaster”
Yes – that’s what the Wikipedia entry is summarizing. I should have said that. I want to read that book, right now. Time to hit the library.
Another interesting thing that Feynman has written about concerns the way in which he had to present his findings to the Challenger commission and to the public – i.e. don’t publicly embarass any of the engineers or managers. From what I remember reading Feynman spent a lot of time with Air Force General Donald Kutyna, a fellow commission member, trying to determine how and when to present his work – for example, the o-ring demonstration. I’ve had the impression that Feynman was often frustrated by the commission – too much public relations and not enough rigorous analysis. The tendency for groupthink can easily extend to the investigating commission.
In this context I can’t help thinking of the Manchester United air disaster in 1958 at Munich Airport. There was a TV documentary on it a year or so ago. Twice the aircraft tried to take off in icy conditions and worsening weather, and twice the pilot had to abort. Even though it was forty years later and the end result all too deeply embedded in the consciousness of some of us (what might Duncan Edwards have gone on to achieve if he hadn’t died from his injuries?), I for one found myself willing the pilot and the engineers with whom he was deep in discussion to say, okay, it’s just not on tonight, tell the players to stay in the airport lounge and not board the plane for the third time. But no, they thought they knew why they’d had problems – for Pete’s sake, you felt like yelling at them, the runway is too icy, that’s the problem, it isn’t worth the risk, what’s the matter with you that you’re even *considering* a third attempt? But off they go, players boarding the aircraft again, trying not to look nervous as the plane revved its way down the runway, on, on to the inevitable horror.
Why did the pilots agree to risk a third attempt in such dreadful conditions? Was it (at least for the first pilot) a determination to show he was a *really* good pilot and could get a plane off the ground safely even in such atrocious conditions. Why did the engineers he consulted not say, let’s call the whole thing off? Did they want to show they *really* understood the problem with the two aborted take-offs, and had shown how to overcome them? What was it in *this* situation, so different from the Challenger disaster, that led to the carnage?
Interesting. It is like the Challenger disaster in one key way – identical, in fact; two previous launches had been called off because of low temperatures. There was a lot of media sighing and harrumphing, and thus a lot of pressure on NASA to just go already – and NASA passed that pressure on to the engineers. Afterwards, of course, everyone realized – oh, oops, what was the big hurry, what was all that sighing and harrumphing for, let’s nobody ever do that again.
Maybe when there have been two aborts that’s the time to be really, really careful; that’s the time to err on the side of caution, because the pull will be the other way.
“The tendency for groupthink can easily extend to the investigating commission.”
Gotta read that book! I have a hold on it at the library; it could be on its way even now.
>There was a lot of media sighing and harrumphing, and thus a lot of pressure on NASA to just go already.< That (depressingly) is only too clear in the case of Challenger. But (of course) there was no such pressure in the case of the Munich air disaster. As far as one can see, external pressures were minimal, other than that it would be good to get the players back home rather than that they should have to stay overnight in Munich. But they still went ahead. Hubris on the part of the chief pilot and the engineers?
Just want to concur with Andy’s point. It is a conceit of modern business management schools that management professionals are capable of managing technical experts without an understanding of the actual subject they are dealing with.
After 35 years as a working scientist with government and the private sector, this simply does not work. The non-expert managers typically generate little respect from the professionals they manage, and retreat into management wankery of the most inane sort. The Challenger disaster and the incidents cited by Feynmann are tragic examples of a much larger issue.
Yes. That was another reason the Challenger disaster interested me strongly at the time – I’d recently worked at an institution where a director with relevant knowledge was replaced by a director with none – a generic manager. We all considered it a highly unfortunate development.