Reply to post: MCAS from a Systems Perspective

Boeing big cheese repeats pledge of 737 Max software updates following fatal crashes

BobC

MCAS from a Systems Perspective

I was an engineer at an aircraft instrument maker while it was developing and certifying its TAWS (Terrain Awareness and Warning System) product. I worked in essentially all aspects of the product except for the design of the hardware itself. In particular, I was involved in requirements analysis, software design and development, hardware production (processes and tools), and FAA certification.

Though TAWS provided only audible and visual warnings, we were greatly concerned that we not urge the pilot to take excessive or inappropriate action, such as by announcing a warning too late, or by announcing an incorrect warning. The official TAWS specification described very well what the system must do, but did much less to define what the system must **not** do.

One of my prime responsibilities was to conceive of ways to "break" TAWS, then update our requirements to ensure those scenarios were properly handled, and then update our certification procedures to ensure those scenarios were thoroughly tested. Many of my findings revealed "holes" in the official FAA TAWS specification, some of which were significant. (Being a competitive company, we fixed them in our product, then reported them to the FAA as real or potential flaws in competing products. Essentially, we kicked the hornet's nest every chance we could. Fun for us, harmful to our competitors, and safer for everyone.)

The "single-sensor problem" is well known and understood within the avionics community. However, as our TAWS was initially an add-on product for existing aircraft, we often couldn't mandate many aircraft changes, which could greatly increase the cost and down-time needed to deploy our product. Fortunately, all aircraft are required to have "primary" and "secondary" instruments, such as a GPS heading indicator backed by a magnetic compass. Furthermore, sometimes the display for a low-priority function can be made to serve as a display for a secondary sensor when the primary fails, in which case it is called a "reversionary" display.

The sweet spot for us was that the vast majority of our initial TAWS customers would already have some of our instruments in their cockpit, instruments that already had all the inputs needed to serve as reversionary displays. Inputs that can be shared with our TAWS product.

When we looked at the whole TAWS specification from that perspective, we realized there were circumstances when "primary" and "secondary" instruments may not suffice, particularly if both relied on the same underlying technology (such as digital and analog magnetic compasses, which sense the same external magnetic field - meaning a non-magnetic way to determine magnetic heading was needed, which GPS could help provide).

I had prior experience in "sensor fusion", where you take a bunch of diverse/redundant/fragile sensors and combine them to make better "synthetic" sensors. Back in the day this was done with various kinds of Kalman filters, but today a neural net would likely be more practical (primarily because it separates the training and deployment functions, making the deployed system simpler, faster and more accurate).

So, for example, let's say all your physical AoA (Angle of Attack) sensors died. Could you synthesize a suitable AoA substitute using other instruments? Given a list of other functional sensors, the answer is almost always "yes". But only if there is a physical system component where all those other sensors meet (via a network or direct connection). We had that meeting spot, and the compute power needed for a ton of math.

But even synthetic instruments need primary and secondary instances, which meant not only developing two independent algorithms to do the same thing in different ways, but also, to the greatest extend possible, running both of them on redundant hardware. Which, again, our initial customers already owned!

This extended to the display itself: What if the display was showing an incorrect sensor value? The secondary or synthetic sensor was compared with what was actually being shown on the display. If we detected a significant mismatch, this system would simply disable the display, completely preventing the pilot ever seeing (and responding to) any bad information.

I'm concerned Boeing didn't do enough analysis of the requirements, design and implementation for MCAS: My guess would be that MCAS was developed by a completely separate team working in a "silo", largely isolated from the other avionics teams. For example, this is an all-too-common result of "Agile" software development processes being applied too early in the process, which can be death for safety-critical projects. And, perhaps, for those relying on them, including passengers, not just pilots. Yes, a company's organization and processes can have direct safety implications.

Another example: When an automated system affects aircraft actuators (throttles, flaps, rudder, etc.) the pilot must be continuously informed which system is doing it, and with a press of a button (or a vocal command) be able to disable that automated subsystem. It seems the Boeing MCAS lacked both a subsystem activity indicator and a disable button. Though it won't happen, I wish this could be prosecuted as a criminal offense at the C-suite, BoD and senior management levels.

I believe the largest problem here was all levels of aircraft certification testing: It would appear the tests were also developed in "silos", independent of their relationships to other parts of the system, including the pilot. The TAWS product I worked on was also largely self-certified, but we did so using, again, two separate certification paths: Formal analysis and abundant flight testing (both real and simulated).

The key element for FAA self-certification to be worth believing in relies on the FAA requirement for aviation companies to work with FAA-credentialed DERs (Designated Engineering Representatives). DERs are paid by the company, so there is great need to ensure no conflicts of interest arise. On our TAWS project, the first DER we worked with was a total idiot, so we not only dismissed him, but requested the FAA strip his credential and investigate all prior certifications he influenced. After that incident, we worked with a pair of DERs: One hands-on DER who was on-site nearly every day, and another God-level (expensive) DER who did independent monthly audits. We also made sure the two DERs had never previously worked together (though the DER community is small, and they all know each other from meetings and conferences).

Did Boeing work with truly independent DERs? I suspect not: There are relatively few DERs with the experience and qualifications needed to support flight automation certification. Which means "group think" can easily set in, perhaps the single greatest threat to comprehensive system testing. I predict several FAA DERs will "retire" very soon.

Even from reading only the news reports, I see several "red flag" issues the NTSB and FAA should pursue as part of the MCAS investigations.

Bottom line, the Boeing Systems Engineers and the FAA DERs have well and truly dropped the ball, not to mention multiple management-level failures. I'm talking "Challenger-level" here. Expect overhauls of both Boeing and the FAA to result. Expect all certs for all Boeing products still in the air to be thoroughly investigated and reviewed by the NTSB, NASA, the EU and China/Russia. Do not expect any new certifications for Boeing for perhaps years.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER

Biting the hand that feeds IT © 1998–2020