Absent oracles, real-world systems will fail, deal with it
I will just cite copiously from the link to the review given above.
Upside-down rocket exhaust as icon because "If this part starts pointing towards space, you are having a bad day and you will not go to space today".
"Risk and the Work Group Culture"
After [Diane Vaughan] systematically rejects the hypothesis that in managerial decision making, any amoral calculators was at play [in the Challenger Launch Decision], she turns her attention to recreating the work group culture and the environment in which NASA engineers and managers worked, negotiated risk and took decisions under uncertainty. She attempts to create a “native view” of the workgroup culture in NASA. There was always a “residual risk” present in all the flights, due to unique design of the shuttle, and a large number of uncertainties associated with such a large complex technical system, which did not have any prior experience, therefore “work groups were calculating risk...where it was fundamentally incalculable” The concept of “acceptable risk”, which was a formal status conferred upon a component by following a prescribed procedure based on a documented engineering analysis and technical rationale, is key to estimating the flight risk. Whereas other enquiry commissions expressed their surprise at the use of “acceptable risk”, it was a norm to fly in NASA culture with a known residual risk. The decision to assess risk and to categorize it as “acceptable risk” was based on scientific method and engineering judgment based on tests and data, and was often negotiated in the work groups.
"Normalization of Deviance"
Normalization of the deviance in performance of O-ring incrementally increased the “acceptable risk” criteria. Also, the (strong) belief in redundancy (there were two O-rings in shuttle design, one primary, and one backup, as opposed to the Air force’s Titan III solid rocket, which had only one O-ring) led to the construction of risk, which was normalized when test performance deviated from design predictions. The early decision to accept the risk became a precedent and part of the workgroup culture, which led to repeated normalization of the deviance. Diana Vaughan explores the normalization of deviance in chapter five and also revisits and revises the post-accident accounts of controversial NASA actions to continue to fly after observing extensive erosion on the STS-2, declaring the space shuttle operational, and failing to report the joint performance during the Flight Readiness Review to the upper-level NASA administrators. After fourth flight of the shuttle, it was declared operational, which resulted in reducing the testing of vehicles and its components, and requirement for reporting problems. This decision had serious structural impacts that affected the work group’s decision-making process.