From /u/darge89 on reddit.
A steel strut which failed under much less load than it was rated to carry was the cause of the recent mid-air explosion of a SpaceX Falcon 9 rocket bound for the International Space Station, according to SpaceX chief Elon Musk. In a teleconference with journalists, Musk said that the 2ft (60.96cm) long and one inch (2.54cm) …
I always enjoy fireworks, but generally when they are pre-planned. :-)
Anyhow - Mr Musk, Thank you for your openness. Like Mr Branson, there are lots of challenges and bumps in the road to the eventual success that we will all benefit from. Good luck.
Its a bumpy ride, but worth it for the final solution, much like the early plane designs, there were some good ones and not so good ones, but people learned from mistakes and gaps in knowledge to get to what we have today.
"you cannot test every part and overprovisioning by a factor of 5 from a presumed trusted supplier would have been deemed a managed risk. You also cannot say test the important parts as all parts are important in a rocket...."
Actually, you can test struts in several useful and meaningful ways, especially when there's a factor of 5 safety.
Depending on the type of steel (I'm guessing some breed of austenitic stainless if it was used in a cryogenic aerospace application) you would've been able to load the parts, or a sample of them, right up to their yield strength in an Instron machine and they would've snapped back to their original dimensions and strength. It's only when you exceed the yield strength that irreversible deformation and changes to properties set in. That would be a particularly useful test when the safety factor is 5 because the yield strength would be 3 to 4.5x the required strength of the part, depending on the alloy and heat treating. You would've broken these bad struts long before they reached their designed strength, but correctly made struts would be undamaged.
Of course, not all struts are shaped right to be tensile test specimens, so you'd have to do partial sampling out of each material lot and carve up the sacrificial parts for properly-shaped test specimens. Or supply witness coupons (witness me!) to travel through the manufacturing process with the struts. With such sampling and good steelmaking and good heat treating practices (like furnace surveys for hot spots), it's unlikely that you'll have one strut in a lot that is much different than any other.
However, if the billet of steel that is eventually carved into struts isn't carefully solution annealed, squished, and otherwise worked to smooth out impurities, then you COULD end up with some nasty bit in some struts in a lot but not others.
In such cases, there's a number of fairly non-destructive tests that can be used: ultrasound, x-rays, dye penetrant, and hardness testing. (Hardness testing might need to be done before final machining or on a test pad on the strut, because the hardness indenters can leave, well, sharp dents that turn into cracks.)
If you're working with a fairly forgiving, high-toughness steel, a flaw of critical size is pretty easy to spot with x-ray and ultrasound inspection (unless the strut is pretty complicated or thick). Dye penetrant inspection finds surface cracks that might be a problem.
Hardness testing, meanwhile, relies on the specific alloy have fairly specific hardness values for a given strength, so it's a quick and easy check of strength. A strut that is much harder than it should be is probably overly strong, and steel toughness tends to drop with higher strength - i.e., steels become more brittle with high strength. Meanwhile, a soft strut would be indicative of an understrength strut.
It's a question of how much you want to pay for the steel strut. You can inspect parts dozens of times from the moment they're raw ore to the time they're being bolted into a Falcon 9 rocket, but therein lies the secret behind $50,000 toilet seats and $800 hammers. Or you can say, "Geez, we're buying this strut from a reputable machine shop who uses certified heat treaters and licensed machinists and the alloy is pretty durable at cryogenic temperatures and has this ludicrous safety factor of 5 and we did first article inspections and qualification testing. We can throttle back from whole-lot inspections." I'd sign off on that decision if the drawing change request got to my desk.
@Boris the Cockroach,
"At least they found the cause, just need to make sure it stays fixed."
Well, they think that's what's happened, it's their best estimate.
With any engineering project you can either over engineer it to achieve a required strength, or you can par everything to the bone and be fastidious about materials testing, manufacturing process and quality control. The former doesn't work for a rocket, it would be too heavy. The latter takes total commitment everywhere in the organisation and supply chain, right down to the smelters. Space X have found themselves somewhere in between.
So it's not just a failed part that's wrong, their working processes that are supposed to ensure quality are also wrong. That needs redesigning too, and then run from scratch across the whole organisation. That's a lot more difficult and expensive than redesigning a simple strut.
I'm also puzzled as to why they chose steel. Steel does some weird things when it gets cold, and this one was immersed in liquid oxygen. It gets brittle, it can fall apart all on its own, you have to test it in its working conditions not at a cosy lab temperature, it reacts with oxygen and rusts (and there's nothing quite as reactive as liquid oxygen). It just doesn't sound like the right material. Even stainless isn't that good; as stresses open up micro cracks in the material the passivation layer is breached. I wonder if they'll change material choice too.
I'd be slightly surprised if they can return to flight within a few months. Process changes are not quick.
not really, you have no idea how the products were tested. Steel does do weird things but it also works. Perhaps under real load/condition testing it was found that the strength dropped by a fact of two, this is still twice the requirement so an acceptable risk. The fact that they have tested the part now and found some aren't rated to the requirements shows that it was a manufacturing error and most likely not a design error.
It was also a support strut not a containment vessel. The strut failed under load, the helium tank shifted and overpressurised the tank causing an explosion. There is no mention of the containment device failing due to design error or corroding through.
He got downvotes because his post is wrong, albeit reasoned. The part was designed to withstand 10klbs, it apparently failed at 2klbs. The part was certified by the supplier to 10k. They tested a lot of struts, and a small percentage failed. The same design/manufacture strut has been used on all previous F9 flights.
So although you could argue they should test each part before use, they were using CERTIFIED parts (and you cannot test every part of the rocket, it's just not feasible)
I'm a bit surprised they didn't already have this in place. The idea that the capsule can survive a rocket break-up is not new: Challenger's crew cabin was in one piece (and probably still pressurised) all the way down.
Anyway, the main thing is they are going to implement this safety measure from now on, before manned flights start.
"Challenger's crew cabin was in one piece (and probably still pressurised) all the way down"
It wasn't pressurized. Emergency oxygen supplies in the seats were activated by some astronauts for the duration of the fall to the ocean (a few minutes, IIRC). Other Challenger astronauts did not, leading NASA to conclude that the cabin depressurized explosively and some astronauts passed out almost immediately. Because of this, future shuttle flights used space suits during launch.
Interesting indeed. From other sources, it appears that the location of the failure was triangulated using audio data. I imagine there are numerous transducers (microphones) located on the boosters. The location of the failure can be found by correlating the data stream from each and calculating the speed of sound through various parts of the structure.
Something similar to the way that the City of Chicago locates gunfire.
The tensile strength of steel is 80000 to 330000 lbs/inch². So the strut was probably only 1/8 to 1/32 inch thick. In tension it should have been strong enough, in compression it would buckle. The weight of the strut or struts was only a faction of the (AFAIK) 1000000+ lbs Falcon where every gram counts ....
BTW The timber in many old buildings is some four times heavier than necessary, because architects could not calculate the minimum requirements.
"BTW The timber in many old buildings is some four times heavier than necessary, because architects could not calculate the minimum requirements."
you do the engineers of old a disservice.
While they may not have been able to calculate the loads involved (in itself an uncertain belief) they certainly had empirical evidence to show just how much timber was required.
The overengineering is usually due to the need to allow for weakening due to woodworm and rot - the more massive the beam, the longer it will last
In the 1980s a structural engineer ran his eye over my 1974 house.
He said that breeze-blocks were no longer used for internal walls on wooden joists - they were too heavy. He also found that the builders had saved materials by spacing the floor joists wider than the design plans. The joists' cross-sections, in the days before stress grading, were generously oversized - but too many of them were "wane" timber***
He decided that the stairwell's design had allowed for a 200% overload - which had all been eaten up by the above problems. His parting words were "don't have any parties until it is all fixed".
The saga later on included the repair builders fitting joists that had been sitting in puddles at the timber yard for years - and sprouted spectacular fungi as soon as they started to dry out.
*** "wane" timber is the piece that has lost a corner because it was cut from the curved outside of the tree trunk.
His parting words were "don't have any parties until it is all fixed".
Which is of course another reason for over-engineering buildings. The architects can't be sure that someone won't overload a floor way past design specifications, be it with 200 people, or a medium-sized library of mashed-tree books, or a large sculpture, or several tons of stolen silver bullion.
(That last item pushed the floor loading past all tolerances. The bullion spontaneously descended from the third floor to the basement, fortunately while the residents of the lower floors were away. The police were waiting when the crims came back to collect their loot. Note for crims: stash your loot around the *edges* of the room, or better still rent a basement flat.
In passing I was once told about the "jump test" by an old-school surveyor. You can get a pretty good idea of the overall soundness of a suspended timber floor under carpet etc., by jumping up and down on it. (Engineering explanation: field-expedient impulse response assessment). Just be sure that it's not *completely* rotten before you employ this test!
This post has been deleted by a moderator
"Don't forget that the tank was a cryogenic tank, which makes many alloys brittle."
I doubt SpaceX did. There's a good selection of stainless steels used by the aerospace industry that will retain their strength and toughness at cryogenic temperatures. The suitable alloys are also some of the "go to" alloys for the aerospace industry, so on the improbable chance that SpaceX engineers didn't consider the operating temperatures of the struts they would've gotten good low temperature performance anyway.
Congratulations to SpaceX and many thanks for all the hard work (and, I'm sure, many sleepless nights) to find the failure point.
On a side note, given SpaceX's penchant for keeping things in-house, what are the odds that strut-making will be either brought in-house or Elon will quietly buy a small metal-company in order to control the quality of the part, which was, after all, the reason for SpaceX bringing so many things in-house in the first place.
"If you want something done properly, do it yourself."
A old colleague started his IT career as an apprentice with one of fore-runners of the British computer industry. He said that as part of his apprenticeship he had to study metallurgy as a subject. In those days just about everything was made in house. The correct properties had to be understood and specified for the metals used for various components of mechanical devices like card punches.
My knowledge of metals goes no further than the two years doing metalwork at a secondary technical school. Not sure if modern GCSE's give hands-on practice to teach tempering by the colours of heating and then controlled quenching.
"Making steel in house? Probably not. You'd still need to test the resultant components to avoid the same issues so if you're thorough in that aspect I can't see much gain in safety from bringing it all in house."
Well...the short answer is, yes, you can gain some safety by at least consolidating a number of steelmaking steps in house.
For example, consider this process control situation caused by having a long supply chain: I've currently got a stainless steel (17-4PH) widget for an aerospace application and I'm trying to figure out why my "stainless" steel is stained. Specifically, I've got a batch of these little bastards with rust all around their perimeter. The chain of vendors I had to consider as culprits include:
1) Mining companies who dug up iron, nickel, chromium, and other alloying elements
2) Various companies that processed the ores into assorted precursor metals; these companies may or may not be the same as the mining companies
3) Some specialty steel firm who bought those ingredients and made billets of 17-4, who may or may not overlap the companies in #2
4) A firm who made various stock shapes out of the 17-4 billets (bars, plates, etc.), who may or may not be the same company as #3
5) A specialty machine shop who bought some bars or plates (my blueprint allows 17-4 purchased to bar specs and plate specs) and carved it into my widget using both mechanical and wire electrodischarge machining (EDM)
6) Not applicable for my widget, but in many machined parts its common for the machine shop to hire a specialty heat treating firm to perform solution anneals to get the raw bar or plate to known condition; then perform stress relief after rough machining; and a final aging treatment to get full strength in the final part. Heat treating is an art, and not always found in a company that has mastered the separate art of machining metals.
7) A specialty chemical company who handles various metal finishing techniques like anodizing or, in 17-4's case, chemical passivation. (Passivation, if done correctly, helps stainless steel live up to its name.) These guys also supplied spec-controlled cleaning prior to passivating (glass bead blasting to remove wire EDM residue).
8) My company, which bolted the widget into an aerospace doohickey with very sensitive optics, the sort that don't like rust particles landing on them.
That's the chain of vendors I have to wade through to figure out who made my stainless part rusty. In fact, my very location (company 8) could handle the work of vendors 5, 6, and 7, but our shops and labs are a lot more expensive than companies specialized in a task. So my manufacturing and procurement departments conspire to outsource the work and thereby make my life difficult. (We hire vendor 5, who handles purchasing from vendor 4 and subcontracting vendors 6 and 7, as needed.)
Why does outsourcing make my life difficult and endanger safety?
Basically, it is far easier and faster for me to collect the data to work the problem than dealing with outside vendors. The databases that are used to control blueprints and purchase materials and track production would be the ones that have the answers to my questions about a flawed product. More than that, the same engineers who made the blueprints will also be the engineers responsible for answering factory floor questions. If I have a question about passivating stainless steel, I could walk downstairs to the metal finishing lab and 1) talk to our metal finishing guru about what could go wrong, and 2) get all the logs from the passivating process (lot information on the beads used to blast the widget, chemical bath charts, post-passivation humidity test results, etc.) in one stop. Just by walking downstairs.
On the other hand, if the widget is built by an outside machine shop who subcontracts passivation, the process gets a lot more complicated. I have to file the request via my company's supplier quality group, who have the formal duty of talking to vendors. The machine shop's engineers get involved and have to start their own investigation to answer my questions, which means involving their management and supplier quality and machinists. They'll then pass along my request to the passivation house, which activates a third group of managers and managers. Emails will fly, weeks will pass, answers will be incomplete, new emails will be sent, butts will be covered, and denials made.
It's the "grape vine" game, but with lawyers.
This process could be faster if I could visit the vendor(s), but that means travel approvals and expenses - a site visit to an out of state vendor can add thousands of dollars to a problem. Sometimes it's completely worth it because getting my own eyes on the problem can reveal answers that would never get into an email. But performing a vendor site visit is so much slower than walking down the stairs to part of my own facility.
Once the likely culprit is identified, THEN there's a difference in fixing it between in-house and at a vendor. If there's a passivation problem in my finishing shop or paint shop or machine shop, *they're in my organization.* I get to make the changes, see the changes implemented, realize I'm screwing up on the test run, apologize directly to the shop workers, take their suggestions (the ones that don't involve incestuous activities with my mother), and get the revision working. I've debugged painting and sealing problems on a level of difficulty similar to this rusty widget in one shop visit.
If there's a passivation problem at a subcontractor, I have to pass my polite requests to the machining vendor, who passes along the request to the guys they hired to passivate the widget. Three groups of management, engineers, and workers are involved and there are formal chains of communication involved that bottleneck my ability to explain my requested change to the passivation house and answer their questions. It's a grape vine game with lawyers, and explains why it's taken 2 months for me to fix this rusty widget.
(The glass bead blasting was inadequate so we changed to cleaning with aluminum oxide sand paper, and the passivation house will check every widget with humidity testing before shipping. I pretty much knew that 2 months ago when I did some in-house testing, but it took 2 months for the vendors to agree, then accept the increased manufacturing expenses without increasing widget costs. I wouldn't have those issues in-house.)
Safety and quality may go up by bringing some steps in house because you're not losing all that information on each step of the supply grape vine, and it's so much faster and easier to adjust the process.
I'll give you a further option you didn't think about...chemical attack due to incorrect storage.
I've seen this myself at a contract storage / distribution company. I'd hired them to store around 300 tonnes of Chinese Thionyl chloride for me, in steel drums. Thionyl chloride is one of those products thats an absolute bastard to pack - it gets out of whatever you put it in. In this case it was in mild steel - which as long as its kept anhydrous, is OK - but acid fumes will still leak from the closures.
Anyway, prior to a delivery I went to inspect the stock, checking and securing bungs. I was horrified to find that the storage company had removed my drums from their dedicated chemical storage bund, and had instead rented some space in a former bus and truck production plant (which had since gone bust). Also stored in the same building was lot of mechanical handling conveyor equipment - much of it made from stainless steel - which was showing signs of chemical attack. In the same room were thousands of kiddies teddy bears - and a production line building truck axles and transmissions - who were having corrosion problems they couldn't understand
I had a fairly sharp discussion with the storage company, explaining in words of one syllable that the reason they were storing the chemicals was because of the hazard, and they'd better get them stored correctly pretty dammned quick. All 300 tonnes were moved that day, I never heard any more about the corrosion problems - I suspect they kept their heads down and denied all knowledge
To second the previous post, the important point isn't that they fix that strut for the next flight. The point is to figure out how their processes allowed a defective critical strut to be assembled into the spacecraft in the first place. Whenever you analyse a system failure you look for where your processes let you down. The failure itself is interesting, but you can play whack-a-mole with individual part failures until you go out of business without getting to a reliable product. I would bet that when they re-examine their processes in light of this failure they identify at least a dozen other latent defect parts that were similarly under-spec but ended up close enough to work. These parts will all be corrected along with the "guilty" strut.
@Grumpy Fellow - The point is to figure out how their processes allowed a defective critical strut to be assembled into the spacecraft in the first place.
You say that as if there are only a few *critical* struts.
They're all critical - and they have now started an additional testing regime on all the strruts they are using (hopefully not just in this location), which they have also redesigned somewhat.
no redesign necessary if the components were not fit for purpose. The design works, the manufacturer has just lost a lot of work, and I imagine word of mouth that the manufacturer cannot create components to specific tolerances and specifications will lose even more work for them.
Reading about the Apollo 13 mission, process failure happened then as well. By all accounts, the O2 tank failure was due to a combination of the tank being dropped slightly during a test five years before the mission flew, plus the tank thermostat manufacturer not getting the memo about the shift from using 28 volts to 65 volts as part of the response to the Apollo 1 fire.
Space.com has a writeup about the incident.
Universe Today has an excellent series of articles about the incident. By curious twist of fate, the damaged vent pipe probably saved the crew from death as the tank stirring procedures had been accelerated to try to deal with the issue - it failed on the fifth stir, but whilst still in space with the lunar module docked, instead of on day 5 as scheduled, with the mission already on the lunar surface.
It seems the tank passed all tests as an item, but the combination of parts made a small bomb. The workarounds were accepted instead of triggering a concern, but then the timescale and complexity of the endeavour pushed the issue down the scale.
SpaceX is a lot smaller and responsive than the Apollo programme, and is data-rich. Some suggest bringing it all in house is a way forward, but they haven't the capacity for this. Keeping everyone communicating is more important - this rocket science thing is no place for folk to hide substandard work, for example. That they can pin down the cause is testament to SpaceX setting up systems to let them learn from every launch, not just the failures.
Something else; the 2nd stage centre-engine shut down that Apollo 13 suffered during launch was actually due to a pogo oscillation that reached 63g in amplitude and was within a second or two of causing a major structural failure when some consequence of the vibration caused the control system to close the propellant valves.
I think that this occurred after the Escape Tower had been jettisoned, so it would have been unsurvivable for the crew.
I once asked around for advice on selecting a consultant metallurgist for a potential corrosion problem in our pressurised SS processing vessels.
The main gist of the advice (from a Mech Eng Professor) was "try to find one with only one arm."
That way we might just avoid over use of their favourite phrase -
'but on the other hand ......'
Whilst its easy to have 20-20 vision with hindsight, time and again we see a lack of testing causing massive cost due to failure. Whether its car components or software defects, its all too slipshod and cost cutting.
Its false economy. Unless you use your customer base as free beta testers, there has to be an investment in testing. The problem is that if a product makes it to market and 'just works' its not very interesting, so the beancounters say 'look at how much more money we could have made without all this extra cost of testing' - failing to realise thats exactly why you have good reliable products in the first place.
At some point you have to trust the manufacturer's guarantee especially if it requires destructive testing of the component. It is all well and good saying a company should test a representative sample all the time but that is what the original manufacturer should have been doing. The question is: if a significant number failed to meet spec why didn’t the manufacturer pick that up in their own testing? I can see lawsuits on the horizon……
Everyday, millions of people get in to a car that is certified safe by various NCAP tests. The car you get in was manufactured using certified parts. The builder DID NOT test every single part before it was assembled,. They relied on the subcontractors making the part to the specification required. You put your children life in the hands of those subcontractors doing their job properly. And it works.
It's no different here, at some point you need to rely on your supplier doing what is requested of them because testing every part is financially impossible. No rocket company does it, not car company does it.
Where did I say perform destructive testing on every part?
What I said was it was a failure of testing process to detect this, and that more testing is better.
In this case, if several of the parts were substandard (i.e. it failed at 2000lb instead of 10000lb) then increased testing of some of these components, even in a non-destructive manner (e.g. testing up to 90% of load max) would have shown them to break way before then.
My point is this - "don't trust, verify". If millions of dollars are at stake, then for each critical component, you have to test it prior to use.
Or you take a punt, & hope the vendor is not selling dodgy goods, and sort it out with lawyers.
American football: A person who criticizes or passes judgment with the benefit of hindsight. Monday morning refers to the games played or broadcast on Sunday, with criticisms leveled by commentators the following week. See also hindsight bias and quarterback, below. OED cites football usage to 1932.
Alternatively a group of El Reg Posters, commenting on a difficult engineering issue, usually by pointing out that they wouldn't have made the same mistake, were they rocket engineers, because they alone are brilliant enough to have thought of what went wrong, and would have focussed their full attention on widget 987465B, See 'Why I would never get hacked' and 'How I back up all my IT systems after each and every keystroke'.
"Before every flight I always email the entire company saying that if anyone can think of any reason to hold off then to call or email me immediately, whether their manager agrees or not," Musk said.
NASA let management structure be more important than engineering. That's how they lost Challenger - the managers overrode the engineers. Then they didn't fix it properly and let it happen again - that's how they lost Columbia.
And I recently spotted via a serverfault link to another site, a worried intern asking what he should do because he thought he'd spotted a showstopper bug in a project very close to release date. He feared that telling management about it would be a career-limiting move.
Managers never learn, do they? (Except, how to justify ever-more-obscene salaries for themselves).
This post has been deleted by its author
Biting the hand that feeds IT © 1998–2019