back to article 'It will go wrong. There's no question of time... on safety or security side'

"Software comes with two unique properties: it's basically impossible to inspect and test, and we don't know the sequencing of instructions at the basic level," Statoil's lead analyst for corporate IT digitalisation, Einar Landre, told today's IoT Tech Expo in London. Giving an engineer's perspective on mission-critical …

  1. Anonymous Coward
    Anonymous Coward

    "Software comes with two unique properties: it's basically impossible to inspect and test, and we don't know the sequencing of instructions at the basic level," Statoil's lead analyst for corporate IT digitalisation, Einar Landre, told today's IoT Tech Expo in London.

    Wrong, wrong and wrong.

    It's actually very easy to inspect software. You just open the source code in your favourite editor.

    It's also very easy to test software, you run it for a range of carefully chosen inputs and compare the result to the expected outputs.

    If you really want to know the instruction sequencing you can disassemble the binary, but in practice this is rarely necessary.

    It may be impossible for him because he clearly doesn't have a clue about computers, but actual engineers have been doing all of these things every single day since computers were invented.

    1. The Mole

      Disassembling the binary only gives you the instruction sequencing if it is single threaded application (and not even that if there are many loops and branches dependent upon external inputs). In a multi-threaded or distributed application there is generally no way to enumerate all possible interleaved sequencing of events, particularly when those events may include hardware operation and error scenarios (such as patterns of packet loss).

      Carefully choosing a range of inputs is hard, very hard, how do you determine in a complex system that you have selected enough variation, particularly taking into account all possible fault conditions and event sequencing. In a complex system just because input A gives output B when you ran it in test on your perfect network at 12:32pm doesn't prove that it will give output B on a real laggy network at 12:00am. You can't test all possible scenarios so you have to narrow down the set, based on the understanding of the code and platform and where the risk is perceived to be, of course with any risk based approach sometimes you get unlucky.

      If code inspection was easy then this obviously means that code that has been reviewed won't introduce any bugs and work perfectly in the system. Experience shows even with thorough code review and test definition and multi layers of validations bugs still get through to the customer.

      The simple fact is most computer software is too complex for a single person to fully maintain a mental model of how it works and all the possible interactions and knock on effects of changes. Modularization and clear specification helps mitigate some of this, though that can actually sometimes be where the security or resilience holes appear.

      1. Robert Helpmann?? Silver badge
        Childcatcher

        Software testing? We've heard of it.

        ...how do you determine in a complex system that you have selected enough variation, particularly taking into account all possible fault conditions and event sequencing[?]

        Run through common scenarios and tests where you know the results you should get and then use fuzzing to find out if your error handling works? This is a good area in which to employ automation in testing. I agree with the rest of what you had to say, The Mole, and see the current state of affairs as there is not even an attempt at any of this in most software houses and especially not among IoT devs.

    2. Tim Hughes

      Not so simple

      Also note such things as branch prediction in modern CPUs, for example.

      How can you then predict the exact sequence of instructions from the software alone? The exact sequence can surely only be determined for a defined combination of hardware, software and input data.

      That seems to be make certainty exponentially more difficult to obtain.

    3. Cuddles Silver badge

      "It's also very easy to test software, you run it for a range of carefully chosen inputs and compare the result to the expected outputs."

      And now you know how the software responds to a small range of carefully chosen inputs. Every single software bug or security vulnerability is the result of someone assuming that because the software responds correctly in a limited test, it must therefore be safe and secure (or alternatively, knowing that it isn't perfectly secure but releasing it anyway). The big problem is that no matter how much you expand your list of chosen inputs, there are an infinite number possible so you can never test them all. All you can do is try to think up the sort of things that are likely to cause problems; every bug that makes it into the wild is the result of someone thinking up an input that the developer didn't consider (or again, did consider but didn't bother fixing). Hence - software is impossible to test; no matter how thorough the testing you do, you're trying to prove a negative by searching every part of an infinite space.

    4. tiggity Silver badge

      Not that easy

      You also need to check what happens with inputs outside of carefully your chosen list.

      Reality is not neat, corrupted data, intergference, malicious actors, etc, etc.

      For non trivial code sequencing is not easy - e.g. multi threading

      If code is simple and crystal clear when looked at in isolation, things can get a lot more complex if multiple instances of this code are running and accessing a shared resource - - race conditions, contention, timeouts, deadlocks, etc.

      Once you go beyond trivial code, running petri net analysis soon shows how complex even "simple code" can be.

      Then theres all the "code" you do not really control - if you are using anything higher level than assembler then a lot of the produced code is outside of your control (e.g. I may define baud rate, com port etc in my serial port read / write code but when I do a serial port write the nitty gritty is done by code written by someone else)

      There's a good reason I'm currently running hundreds of instances of 3 different applications (all being fed with auto generated random test data) that all read from / write to a common database ... and when that's all complete comparing database contents / error logs with expected results based on the auto generated inputs.

      Meanwhile I have several instances of test aid application that frequently write locks different parts of the database.

      ..I'm doing that because I want to check error handling, 2 phase commit etc. all works when there's contention, database access issues.

      Everything was hunky dory on nice simple unit tests, but I need to ensure it works when there's nasty real world complexity.

    5. pdh

      > It's actually very easy to inspect software. You just open the source code in your favourite editor.

      Read "Reflections on Trusting Trust," by Ken Thompson.

      1. Charles 9 Silver badge

        But also consider this rebuttal: "Countering Trusting Trust through Diverse Double-Compiling" by David Wheeler.

    6. Oh Homer
      Headmaster

      Sofware is "impossible to inspect and test"?

      What utter, utter bollocks.

      If you can write it then you can test it.

      1. Charles 9 Silver badge
        Devil

        Re: Sofware is "impossible to inspect and test"?

        "If you can write it then you can test it."

        Test it, yes. Test it completely, no, because you can only think of so many ways. There's no way to account for every possibility because you won't be able to even envision all the possibilities. And as they say, they only have to be lucky once...

        1. Ian 55

          Re: Sofware is "impossible to inspect and test"?

          .. and because it's your baby, your testing probably won't include the stuff you're not fairly certain it can cope with.

        2. Oh Homer
          Headmaster

          Re: "no way to account for every possibility"

          Yes there is, with clearly delineated types, values and branches.

          As the software engineer, it's your job to control the data and create the pathway it takes. If you can't control that process then you haven't created it properly in the first place. You don't need to blacklist an infinite pool of invalid possibilities, you only need to whitelist the valid ones.

          The problem is not that software is "impossible to test", it's just that software is increasingly being written by people who are not applying basic engineering principles to the process, using tools that designed primarily for monkeys to make money, not engineers to solve problems.

          The statement should be amended to "badly written software is difficult to test", but as it stands it's pure bullshit, just an excuse for sloppy programming.

          1. Anonymous Coward
            Anonymous Coward

            Re: "no way to account for every possibility"

            "software is increasingly being written by people who are not applying basic engineering principles to the process, using tools that designed primarily for monkeys to make money, not engineers to solve problems."

            Hallelujah. Someone else has seen the light. Not that it'll be welcome in many places.

            It actually goes a little further than that, in that whole *systems* are being designed by "people who ...".

            Often it doesn't really matter much. But spreadsheets are not the same as web apps are not the same as legacy transactional apps (classical banking) which are not the same as realtime transactional apps (ticket booking) which are not the same as Industry 4.0 (arrggghhh - oh for the 20th century days when it was just called Computer Integrated Manufacturing and the like).

            Sometimes the differences matter a lot. Some people in charge don't seem to understand that different situtations may need different approaches.

    7. JimC Silver badge

      > It's also very easy to test software, you run it for a range of carefully chosen inputs

      I fear that's the sort of hopelessly naive thinking that helps this industry deliver such unreliable bug ridden security vulnerable crap.

  2. ForthIsNotDead

    Waffle

    Not at all impressed with this "lecture". A load of intangible waffle.

    "We need standards...<waffle garb piffle>..."

    He's clearly never heard of IEC-61508, which prescribes an international standard for building, documenting, testing, and proving certifiably safe hardware and software systems.

    That's very worrying considering he works for Statoil. Fortunately the Statoil engineers that I consult with daily in Aberdeen *have* heard of it.

    Not at all impressed.

  3. Sealand
    Facepalm

    As a colleague once said: "you write in C and debug in assembly".

    That held true once when a simple C case structure failed for no apparent reason, but only on hardware. By inspecting the assembler code, it turned out the compiler had made an error, but the simulator had the exact same flaw that cancelled the error out in the simulation.

    Just wait until the AIoT ... the artificial intell ... oh, you know what I mean.

    1. Anonymous Coward
      Anonymous Coward

      @sealand: common mode errors

      "the [toolchain] had made an error, but the simulator had the exact same flaw that cancelled the error out in the simulation."

      For the same kind of reason it can help to have very independent people writing the code and writing the test data, too, at least for part of the exercise. Doesn't stop them both making complementary errors - been there seen that got the scars.

      1. Doctor Syntax Silver badge

        Re: @sealand: common mode errors

        "it can help to have very independent people writing the code"

        Management really hate very independent people.

    2. Anonymous Coward
      Anonymous Coward

      "[...] but the simulator had the exact same flaw that cancelled the error out in the simulation. [...]"

      A case of two wrongs do make a right.

      Several times I have seen people fix an obvious error in the code - that they just happened to notice in passing. Only when things then started failing did they discover the other bit of code that was complementary. Possibly the origin of the wise advice - "if it ain't broke - don't fix it".

  4. toxicdragon
    Boffin

    Standards

    "We need standards,"

    https://xkcd.com/927/

    technically beaten to this but I am still posting the XKCD link.

  5. Anonymous Coward
    Anonymous Coward

    It is a truism that "If anything can go wrong - it will go wrong - at the worst possible moment".

    Not matter how well designed and tested - any system has an assumed set of constraints on factors that determine its behaviour. If any of those factors are outside the expectations then all bets are off.

    I spent a career investigating "impossible" problems. Invariably they were due to a factor, possibly outside their control, that people had ignored, discounted, or were totally ignorant about. When the specialists were given a detailed diagnosis of the failure they would then come back and say "in that unexpected case it would fail".

    A systems programmer was once very insistent that a mainframe data corruption fault could not be caused by a dropped bit "because the manual says that the memory has a parity check". Only if you understood the logic design of the CPU - did you know that the internal data bus itself did not have parity checks.

    1. Anonymous Coward
      Anonymous Coward

      "It is a truism that "If anything can go wrong - it will go wrong - at the worst possible moment"."

      Would this be true even of formally proven code? Just saying...

      1. Michael H.F. Wilkinson Silver badge

        Formal proofs have their limits too. I have used formal methods to prove algorithms correct, but the correctness proof very often (if not always) has a set of preconditions. If the actual input means the preconditions are violated bets are off. Besides, even if my algorithm is correct, I must then show that my implementation is correct, and that my compiler is correct, and that the CPU is correct (remember the old Pentium bug?). I found (ages ago) that in MS Pascal the statements

        current := current^.next^.next;

        and the code snippet

        current := current^.next;

        current := current^.next;

        had a very different outcome, even when used (correctly) in a linked list with an even number of nodes. The first version caused a crash of the program, the latter worked flawlessly. Both are formally correct, but the compiler apparently didn't handle the double indirection correctly.

        This is not to slag off formal proofs, just to say they are not the full answer

      2. Anonymous Coward
        Anonymous Coward

        "Would this be true even of formally proven code?"

        Yes - because it would probably be running on hardware whose design had not been formally proven.

        Even if the hardware was also formally proven - there would be electrical and environmental constraints outside which the hardware could behave unpredictably.

        I found two common hardware things that many people were ignorant about.

        One was the effect of random particle emissions on memory chips - particularly DRAM. These particles could be externally cosmic in origin - or inherent in the chips packaging material. It was counter-intuitive that the higher spec ceramic packages were potentially a richer source of such particles.

        The second thing was logic gate metastability. This is where an asynchronous level change violates the set up timing of a logic input. This can cause the gate to take much longer to switch levels - by a very indeterminate amount.

        1. Charles 9 Silver badge

          Fair enough. I had been thinking about seL4 at the time and knew its formal proof was on the precondition that there was no direct memory access (a common and useful efficiency booster), but I had to wonder if it was possible for a formal proof to be able to cover all cases, but the above shows there's always a way in for Murphy.

          1. allthecoolshortnamesweretaken Silver badge

            "... the above shows there's always a way in for Murphy."

            Especially if he has help from Gödel.

            1. Solmyr ibn Wali Barad

              Yup. Add Heisenberg and Schrödinger to the team and we'll start to get an inkling of what's really going on.

        2. John Smith 19 Gold badge

          "The second thing was logic gate metastability."

          Noted by an Ivor Catt in 1968

          1. Anonymous Coward
            Anonymous Coward

            Re: "The second thing was logic gate metastability."

            [Another ex-reader of Wireless World, by any chance?]

            I think Catt (and maybe Walton and Davidson too) saw more than just gate mestastability. They saw that digital design engineers need to have a grounding in RF electronics.

            http://www.ivorcatt.org/digital-hardware-design.htm (The book in question?)

            That's even more relevant today (in the days of magic miracle HDMI cables) as it was back in the days of 2708 etc EPROMs. Back in the 2708 era, one exercise I saw cost the company a fortune because the electronics people designing an EPROM programmer for boards with soldered-in chips didn't realise that if they drove the "high speed" PROM programming voltage down a wire wrap backplane, it had to be treated as a transmission line and terminated accordingly.

            If you just wire chips together, as a young digital engineer would back then, unfortunate reflections would occasionally occur on the promming voltage driving it way over the design limits, and not only damage the 2708 being programmed, but much of the system around it. Which got expensive, especially in mil-spec kit.

      3. a_yank_lurker Silver badge

        "Would this be true even of formally proven code?" In functional languages functions are typically split into groups: pure and impure. The pure functions, in principle, can be formally proven to work correctly every time. The impure functions such as I/O or database queries can not be given such a guarantee because one can not guarantee inputs will be what you expect or accounted for. So even using a functional techniques at best only partially mitigates the problem. In imperative languages where the program state is much more mutable the problem is even worse (did one unintentionally change a more global variable for example).

        The point being made is testing will not cover all the potential cases - they are often infinite - but only those that experience and judgement deems the most important.

  6. EastFinchleyite

    " it's basically impossible to inspect and test,"

    I agree. Parts can be tested under test conditions but that does not mean all instances can be tested completely under all real life conditions.

    BUT, even if software could be tested completely, there is the fuzziness of time. After all the testing has been done and passed, some annoying engineer (me) comes along and makes a hardware upgrade so trivial that it does not require comprehensive system retesting before it is allowed to crash to whole system. That would be annoying with ATMs but more concerning with airliners.

    What makes humans needed when it comes to safety critical systems? We are more likely to make small mistakes but we tend to have a "don't be so bloody stupid" checking loop that is constantly going through our minds. Perhaps what we need is not to fruitlessly pursue absolute software testing but rather to implement completely separate DBSBS processes that are not integrated with the checked system. Maybe two levels of DBSBS like a co-pilot cross checking the main pilot's setting of the autopilot.

    It's probably already being done but is far to expensive and complex for IoT. You don't expect a £1 watch from Poundland to keep good time, why expect a cheap IoT webcam to not launch a DDoS attack on the Pentagon?

    1. Anonymous Coward
      Anonymous Coward

      "Perhaps what we need is not to fruitlessly pursue absolute software testing but rather to implement completely separate DBSBS processes that are not integrated with the checked system. Maybe two levels of DBSBS like a co-pilot cross checking the main pilot's setting of the autopilot."

      Not even that's going to help because then you'll run into things like common mode failure (or to use your analogy, the pilot and copilot make the same mistake--or worse, are in cahoots) that hit all the redundancies at once.

    2. Anonymous Coward
      Anonymous Coward

      "You don't expect a £1 watch from Poundland to keep good time, [...]"

      If the tolerances just happen to be in harmony then it could keep very good time. I have a cheap plastic Accutime bedside clock that runs on one AA battery for several years. It never needs adjustment between battery changes.

      In the 1970/80s you could buy a high end camera maker's lens that guaranteed a good spec - or buy a much cheaper lens from people like Vivitar. If you were lucky the latter could be as good as the high end one.

      In appearance they were identical - and in fact they were from the same manufacturing line. The high-end camera makers merely selected the ones with the best test performance. In the same way you can get a very good car engine by carefully matching the components for ordinary car engines.

  7. Anonymous Coward
    Anonymous Coward

    @various

    So many posts, and some interesting insights emerging. I know not of the IEC standards (which is a failure on my part) but then nor did most of my colleagues in the safety critical world of DO254 and DO178 (aviation), so I'll stick to aerospace.

    "The big problem is that no matter how much you expand your list of chosen inputs, there are an infinite number possible so you can never test them all."

    Not infinite, but quite possibly inconveniently large (especially from the point of view of a bean counter). What can we do to address that?

    Well. choosing your test input values carefully may help. What does that mean?

    Random (Monte Carlo) style choice of inputs is an option already mentioned, but the space of test inputs gets big quite rapidly, especially if time dependencies come into the picture. And you still have no idea whether the important cases have been covered.

    Time dependencies (including things like data stored from a previous iteration) are quite inconvenient from a testing point of view.

    One can look at the code's decision points and choose values appropriate for testing the various possible outcomes of the various decisions. This can be done manually or with tools. In suitable combinations, preferably. It's usually a big number of combinations and a lot of testing but it's a lot less than infinite.

    When looking at decision points, does one look at the unmodified uninstrumented source code, or the binary end result that's actually been generated by the tool chain? Knowing that they match is another question entirely - sadly some test tools deliberately make their own (unnecessary and irrelevant) modifications to the program being tested, therefore making it impossible to know that what's been tested matches what's been shipped, either in source or binary forms.

    That said, looking at source is relatively simple and relatively compatible with bean counters (and also frequently suits tool vendors), but when push comes to shove, processors don't execute source [1], they execute binaries.

    It is in principle possible to do static analysis of some classes of binary, spotting the decision points (ignoring most of the rest of the code) and generating test inputs accordingly. You don't need a simulator of the whole processor for this, and it may be substantially quicker than some other options.

    Something similar is possible by running the code in a more comprehensive simulator/emulator and tracing the decision points, but the requirement for a more comprehensive simulator is sometimes inconvenient.

    These two approaches both rely on the fidelity of the analysis/emulation/trace tools. You can eliminate the dependence on tools by running in real systems, but in the embedded market, the real systems tend to be sufficiently slow that testing becomes a proper chore, even if it's offshored to a place where time is very little money.

    A wise mix of the the various options would often make sense from an engineering point of view but the additional time and cost has often been unacceptable to Management I have known.

    Did anybody mention that time-dependent effects at run-time are a challenge?

    None of which is a replacement for having clued up design and verification teams in the first instance, but such teams tend to be inconveniently expensive, and can risk delaying the project (and expanding the budget) beyond what Management have promised.

    [1] A username round here mentions Forth. Maybe someone should look at something FORTH-like as a language for high criticality software. Simple, compact, maybe even safe (albeit quite possibly impenetrable to the average contractor and bean counter). Arguably doesn't even need a trustworthy compiler, certainly doesn't need a complex untestable unprovable compiler.

    "software is impossible to test; no matter how thorough the testing you do, you're trying to prove a negative by searching every part of an infinite space."

    How about "testing cannot demonstrate the absence of errors, but it can show when they are present.". NB not just software testing, but software does make life particularly tricky, especially as the failure modes of a software-based system are quite hard to predict.

    1. Doctor Syntax Silver badge

      Re: @various

      "Arguably doesn't even need a trustworthy compiler, certainly doesn't need a complex untestable unprovable compiler."

      Unless the H/W executes Forth directly (in which case you simply push the problem down a level) it needs an interpreter. What's the interpreter written in and how's it compiled if it's in a higher level language?

      However, if the code inspection reveals a hard-coded root password you can stop right there and throw the whole lot out.

      1. John Smith 19 Gold badge
        Go

        "Unless the H/W executes Forth directly ("

        Actually Rockwell Collins did (and for all I know still does) runs it's avionics software on a proprietary stack machine.

        I've seen sample jet engine control software written in what looks like a stack based language. Might have been Forth, but the Forth philosophy is you extend Forth into a a task specific language and program in that.

        1. Anonymous Coward
          Anonymous Coward

          Re: "jet engine control software ... stack based language."

          "jet engine control software written in what looks like a stack based language."

          Maybe you're thinking of the LUCOL language, which wasn't stack based but could easily look like it was?

          There is finally an online version of a ten-page early 1980s paper written by the original designers of the language (at Lucas Aerospace) and published by the American Society of Mechanical Engineers:

          http://journals.asmedigitalcollection.asme.org/data/Conferences/ASMEP/83943/V005T14A006-82-GT-251.pdf

          Other than that there's remarkably little written about it even though it was in so many Rolls Royce engines that RR eventually bought the relevant bits of Lucas Aerospace.

          Still flying, still being updated, on some older RR engines and maybe elsewhere. New-from-scratch stuff tends to be hand-crated Ada or even autogenerated from model based system engineering models.

          Those who are familiar with some of the better PLC programming languages (beyond ladder logic) may recognise some of the concepts even if the terminology and process is different.

          Paper abstract:

          "An Approach to Software for High Integrity Applications

          W.C.Dolman J.P.Parkes

          This paper outlines one approach taken in designing a software system for the production of high quality software for use in gas turbine control applications.

          Central to the approach is a special control language with its inherent features of visibility, reliability and testability, leading to a software system which can be applied to applications in which the integrity of the units is of prime importance.

          The structure of the language is described together with the method of application in the field of aircraft gas turbine control. The provision of documentation automatically is an integral part of the system together with the testing procedures and test documentation. A description of how these features are combined into the total software system is also given."

      2. Anonymous Coward
        Anonymous Coward

        @Doctor Syntax

        Careful with your wording. Pretty sure "throw the whole lot out" gets translated into "ship it" at some business layer.

      3. Ian 55

        Re: @various

        If you're not running on Forth hardware (and that tends to be simple) then the interpreter has some assembly language of whatever you are running it on.

        The compiler can be as simple as:

        Do I know this word?

        If yes, either execute it (if it's 'immediate') or make a link to it for execution by the interpreter. Done.

        Is it a number?

        If yes, compile code to push it onto the stack. Done.

        Stop and complain. Done.

    2. Anonymous Coward
      Anonymous Coward

      Re: @various

      "[...] but when push comes to shove, processors don't execute source [1], they execute binaries."

      I have a C program which has evolved over nearly 30 years through various 8/16/32 bit compilers. The current version is stuck on MS Visual Studio 6, It compiles with later MS C compilers - but even with all optimisations switched off it just won't run properly.

      The last gotcha from a compiler update was when it assumed that identical code inside differently named functions meant that they could all be optimised to one instance - with a single undifferentiated entry point. They had different names for the reason that their calls were differentiated elsewhere.

  8. tekHedd

    His "current day" argument is stuck in the 80s.

    Lost me at "you can't test it."

  9. Anonymous Coward
    Anonymous Coward

    When the company started building 3rd generation mainframes - they had a department dedicated to writing the hardware test programs. The programmers looked at the hardware designs and wrote the tests according to the perceived limit conditions.

    However their test suites were behind schedule when the first prototypes were ready to be commissioned. So the commissioning team quickly knocked out their own tests - mostly using random data rather than determining the limiting conditions. They also wrote new specific tests when a fault was diagnosed by sheer hard graft.

    Eventually it was found that these skunk works tests were far better at finding problems in machines that had not yet been fully commissioned.

  10. John Smith 19 Gold badge
    WTF?

    You'd never think the Shuttle flew for 30 years without a flight bug.

    Which given that there was no manual control system (The computers and/or the APU fail you bail out or you die) and the design is too unstable for a human pilot to input control movements fast and accurately enough was just as well.

    How you do it.

    1)Design the code.Break it up into segements. Design the detailed equations it implements

    2) SCC system tracked the history of everything. Code, scripts, test data sets on a line by line basis. Cutting edge in 1974, SOP today.

    3)Structured walk throughs and documented fixing. Must be done in an impersonal way. It's a bug hunt, not a witch hunt. :-(

    4)When a bug gets through understand why and search for that code pattern, then add that to your list of standard patterns to avoid.

    BTW Shuttle was written in HAL, a high level language. A lot of the bugs in the early days turned out to be when people skipped this process and directly patched the code in assembler.

    Test data generation for code coverage is not inn fact a black art. The books by Glenford Myers (who worked for IBM in the 70s and 80s) explain the process quite well.

    If you want do to this today in the UK call Altran Praxis who will do this in SPARC, a safety critical version of Ada. but it's true multi tasking remains problematical. they will do the same but also using theorem provers and the Z language.

    What I'll note is this is more expensive that regular code but not a lot more expensive.

    And fixing IoS failures promises to be much more expensive due to the large deployments.

    Incidentally the F35 ALIS logistics system is not written in Ada. It's written in C/C++. LM said getting Ada programmers was too expensive.

    I wonder how much of the ALIS issues writing in Ada would have prevented? Obviously not a problem for LM. It's the US taxpayers who pick up the bill, and will continue to do so as the F35 is now to coin a phrase "too big to cancel."

    I think you're lecturer is a bit behind the times. But the question is will companies pay to use these techniques?

    1. Anonymous Coward
      Anonymous Coward

      Re: You'd never think the Shuttle flew for 30 years without a flight bug.

      "1)Design the code.Break it up into segements. Design the detailed equations it implements"

      But then how do you deal with gestalt problems like race conditions that never appear in the individual components but only in the whole (thus gestalt: worse than the sum of the components) and even then only under certain edge case conditions?

      1. John Smith 19 Gold badge

        "But then how do you deal with gestalt problems like race conditions "

        Well I'd start by calling them race conditions.

        The Shuttle computers used a set of sync codes which each received from the others at minimum intervals. Failure to do so suggested something had gone wrong. Watchdog timers help internally. .

        You'll also need to check for each unit what resources it needs a lock on and find when 2 or more modules want the same resource and under what circumstances 1 won't release it. The deadly embrace has been know since the 1960s, as has ways to identify and prevent it.

      2. Anonymous Coward
        Anonymous Coward

        Re: race conditions

        "how do you deal with problems like race conditions that never appear in the individual components but only in the whole"

        Well, in hardware you'd perhaps start by looking for race conditions **in the design**.

        For the sake of simplicity, let's assume binary logic inputs and outputs. A race condition may exist on the path between a particular input I1 and a particular output O1 (which depends directly or indirectly on I1) if there are two or more differing paths between I1 and O1, such that any given change of state in I1 causes the two (or more) paths to produce *different* (conflicting) changes in the ultimate output, O1. If no such conflict is possible, as shown by analysis or testing, there are no race conditions.

        In the absence of such conflicting paths, there is no race condition, surely?

        In the absence of appropriate analysis and testing of the software (as designed and as implemented), there may or may not be a race condition, surely?

        1. Anonymous Coward
          Anonymous Coward

          Re: race conditions

          "In the absence of appropriate analysis and testing of the software (as designed and as implemented), there may or may not be a race condition, surely?"

          Nope, because no one really sees the whole thing. Not even the design. That's why I call them gestalts. Each individual component seems all hunky-dory, but no one sees the entire thing nor how each part interacts as part of the whole. That's why you end up with things like perfectly-tested code snippets behaving badly as a whole, because no one can really see the whole. Plus what if conditions alter just so, like a processor that does certain operations faster in some, slower in others, creating a "falling through the cracks" problem?

          1. Anonymous Coward
            Anonymous Coward

            Re: race conditions

            "what if conditions alter just so, like a processor that does certain operations faster in some, slower in others, creating a "falling through the cracks" problem?"

            That might be a very fair (and interesting) question.

            If you consider the time dependence to be a risk, either you design it out, or you mitigate it some other way. Keep It Simple, Surely? If the system designers can't see a time dependence, in a real time system, then we're all f*cked either way.

            Case in point: a DIY ground-up (VHDL) implementation of a bi-directional interface between a device and memory system, where the device can run faster than memory can (think of it as small fast cache<->big slow main memory if you will).

            That approach clearly has the potential to induce timing related issues (amongst others). This was in a system where in principle the timing (and the analysability of timing) were critical to product safety.

            Engineer: "Where's the analysis that says there's no safety issue with your proposed design, e.g. with predictability of timing?"

            PHB: "Why would there be? Everyone else does it this way, with cache." (paraphrased)

            Engineer: "In the COTS market they do, yes. Does everyone else in the real time safety critical business do it this way?"

            PHB: [pained grimace; new topic]

            Same design also had issues with its memory error correction system, which as originally proposed would, under certain relatively infrequent but almost inevitable circumstances, have led to stale data being undetectably written to main memory after a correctable memory error had been detected.

            I don't know if either issue was properly resolved.

          2. John Smith 19 Gold badge
            FAIL

            "That's why you end up with things like perfectly-tested code snippets"

            In this context if that's all you're testing the fail is baked in.

            For starters inside the computer if you're looking at hard real time you get into what else is running on the PC, what's got priority, what can generate interrupts etc. Note that straightforward well factored code is likely easier for a compiler to optimize and remember that code can be made as fast as you like, provided it doesn't have to give the correct answer.

            Actual control systems engineers either do detailed simulations which include detailed plant dynamics IE actual valve opening/closing times, compression effects on pumping aerated fluids etc or they build hardware-in-the-loop versions of the actual hardware. In the aircraft industry referred to as an "iron bird."

            You can see a minimal example of this in the film "Zero Days" where some of the people who found Stuxnet built a little hardware model to explain what it does.

            What you've described is like learning to drive by playing Audiosurf. :-(

  11. Will Godfrey Silver badge
    Unhappy

    Even when everything else is proven

    There's no accounting for a 3 phase supply for an entire factory complex hidden behind a thin partition where you are instructed to mount the main control panel.

    I know of this by direct personal experience. We only got a clue as to the source of the problem when one of our engineers complained his laptop kept locking up when he was trying to fault-find there.

  12. ecofeco Silver badge

    Heavy industry runs on PLC, not the IofHype

    Heavy industry automation and telemetry run on PLC and dedicated networks, often physically separate networks that are often just straight telemetry and a PLC control interface. Any actual cloud connections are VPN.

    IoT and almost everything else Internet related are seen as toys or for admin purposes only.

    On top of that there are, IIRC, 5 main PLC languages.

    As for the oil industry, they are very heavily digitized. I know, I've supported it, they just don't support any extraneous crap that isn't necessary or sucks bandwidth. Satellite time costs a LOT of damn money. Remote to rig in the middle of the ocean through a satellite shows you real quick that you do not waste that bandwidth.

    1. ecofeco Silver badge

      Re: Heavy industry runs on PLC, not the IofHype

      I forgot to add the security side. Rigs and oil businesses in general take digital security very, VERY seriously. IoT is fucking bullshit as far as they are concerned.

      1. Bill B

        Re: Heavy industry runs on PLC, not the IofHype

        On the other hand, sending an expert to an oil rig when things go wrong also costs a shed load of money, particularly if said oil rig is the other side of the world and the plant is shut down because they can't diagnose what is wrong.

        Being able to remotely diagnose what has gone wrong can be useful.

  13. Stevie Silver badge

    Bah!

    "Cheerful chap writes off all mission-critical IoT software without realising it"

    And your point is?

    "It's almost as if real engineers look at industrial IoT offerings and say to themselves, “Nah, we've been working perfectly well without all that guff, why bother?" "

    No, it's exactly as if that were the case. Because, well, Duh!

    And given the sorts of stories recently seen on El Reg reporting from the hellish, shell-pocked IoT landscape, who can blame them?

    SEE: Light bulbs, thermostats, baby monitors, etc, more etc, tediously more etc.

    I am looking for a small home weather station and I can tell you the models that want to join my LAN are moved to the bottom of the list pending some sort of certification that the manufacturer understands how not to have their network-privileged crap used as a conduit for shenanigans as the default configuration.

    1. Anonymous Coward
      Anonymous Coward

      Re: Bah!

      "I am looking for a small home weather station"

      Replace the weather base station with something you can program yourself. The separate sensors are generally 433MHz and the relevant receiver modules are very cheap.

      Alternatively buy a USB connected Software Defined Radio (SDR) module that will decode the sensor signals for you.

      Some weather base stations allow you to extract their data via USB.

      You can then interface the USB to your LAN with a low-power device which you trust.

  14. bobajob12
    Facepalm

    IoT is not industrial automation/control

    Software to control a oil rig drilling assembly, or managing the control surfaces on a jet plane, that's industrial automation. You don't fanny about because mistakes get your CEO dragged in front of Congress, people die and your share price tanks. Hence Ada, mil-spec and all the usual safety-first standards.

    This works when you can sell your software for millions of dollars.

    IoT is nothing like that. Software is written to be cheap, on sensors that are even cheaper. No one is writing code to defend against cosmic-ray induced memory corruption, or fuzzy inputs, or indeed anything that is not strictly what the designer hoped would be a normal day in Peoria.

    There will never be adequate security in this sort of consumer grade fluff, because the dollars to make it so aren't there.

    Ergo, avoid IoT if you want to have a secure and safe home environment. And when you fly, hope that your plane's programmers learnt the professional standards and not the IoT ones...

    1. Anonymous Coward
      Anonymous Coward

      Re: IoT is not industrial automation/control

      "hope that your plane's programmers learnt the professional standards and not the IoT ones."

      And hope that your plane/train/bus/car manufacturer's components (including software) do not come from the same kind of lowest cost suppliers (especially offshorers) as are used in the volume IT/IoT world. Except they probably will, eventually if they haven't already done so, because the low cost suppliers and low cost methods drive the better quality more expensive ones out of business, This will continue until something goes seriously and visibly wrong and leads to the process getting rebalanced back towards quality somewhat. This hasn't visibly happened yet, despite e.g. Toyota and their uncommanded acceleration business [1], Boeing and their ultimate outsourcing business on the Dreamliner (batteries are just the most visible bit), and increasing numbers others elsewhere.

      [1]

      http://www.safetyresearch.net/blog/articles/toyota-unintended-acceleration-and-big-bowl-%E2%80%9Cspaghetti%E2%80%9D-code

      https://betterembsw.blogspot.co.uk/2014/09/a-case-study-of-toyota-unintended.html

      "Abstract:

      Investigations into potential causes of Unintended Acceleration (UA) for Toyota vehicles have made news several times in the past few years. Some blame has been placed on floor mats and sticky throttle pedals. But, a jury trial verdict was based on expert opinions that defects in Toyota's Electronic Throttle Control System (ETCS) software and safety architecture caused a fatal mishap. This talk will outline key events in the still-ongoing Toyota UA litigation process, and pull together the technical issues that were discovered by NASA and other experts. The results paint a picture that should inform future designers of safety critical software in automobiles and other systems. "

      NB nothing specific against Toyota here. They were just the first to get caught out in relatively public view. They won't be the last, as VAG have already shown.

      1. Charles 9 Silver badge

        Re: IoT is not industrial automation/control

        "Except they probably will, eventually if they haven't already done so, because the low cost suppliers and low cost methods drive the better quality more expensive ones out of business,"

        In heavy industries (the kind that involves huge things, billions of dollars, and plenty of lives), quality usually trumps price because they have the price of failure to consider (not just monetary but legal--these are the kinds of industries that can draw the attention of legislatures when crises emerge). Sure, things slip now and then, but once things like the Toyota and Volkswagen scandals appear, they usually tend to get back in line for fear of being next.

        1. Anonymous Coward
          Anonymous Coward

          Re: IoT is not industrial automation/control

          "In heavy industries (the kind that involves huge things, billions of dollars, and plenty of lives), quality usually trumps price "

          One word proves you wrong: Dreamliner.

          Batteries in particular, but other stuff too. It's not just Boeing (and their change in role from designer and integrator, to brand management and final assembly of outsourced components), others are at it too. The impact on e.g. Boeing of outsourcing (especially of outsourcing the overall profits) but being unable to outsource the corporate risk was predicted long ago, but of course it took a little while to actually show up, by which time the PHBs involved have generally made their fortunes and downsized the engineers.

          "they usually tend to get back in line for fear of being next."

          Never seen any evidence of that. Have you?

          1. Charles 9 Silver badge

            Re: IoT is not industrial automation/control

            You didn't read the ENTIRE reply. I said, "Things slip now and then." But note, AFAIK, no one DIED as a result of the Dreamliner scandal, so Boeing gets some egg on their face, but they move on.

            "Never seen any evidence of that. Have you?"

            The Jungle sure scared the meat packing industry straight. After a couple airplanes broke up in mid-flight, airplane engineers cottoned onto the concept of flutter, and now you don't see flutter-based breakups anymore. Just look up your favorite engineering disasters, and you'll usually see fallout that forces industries to pay attention.

  15. STZ

    When talking industry, forget IoT

    Back in 2011 at the Hannover Industrial Fair, the term "Industrie 4.0" was coined for a common effort of German government, industry and research institutions to further advance industrial production and automation - and was widely adopted internationally, now slightly modified to "Industry 4.0". Subsequently, some people suffering from the "Not invented here" syndrome invented the "Industrial Internet" which means essentially the same, but scares some production people as they mostly prefer to keep their plants apart from the Internet.

    The previous industrial revolution was to introduce electronic controls into the production process, which in the 70's brought PLC's (programmable logic controllers) and similar automation gear - that stuff is definitely not cheap but typically works very reliable, a few commenters have already referred to that technology, now often called OT (Operations Technology) to distinguish it from typically less reliable IT. The Germans retrospectively coined the term "Industry 3.0" for this development which started some decades ago but still has quite some potential for growth.

    That same kind of automation gear is also widely used in retail (eg. automated warehouses), transportation and logistics (eg. container ports) and other places that need sturdy and reliable technology. On the other hand, there is lots of mostly consumer-oriented cheap stuff like Amazon's Dash Buttons and other so-called "Smart Home" gear where people might think twice about whether it is indeed a smart idea to have such items in their homes ... (;-))

  16. John Smith 19 Gold badge
    Unhappy

    For what it's worth.

    The root cause of most of these vulns are not mysterious. The trouble is people seem to fix the fault they found and don't go back and fix the source.

    Finding a bug does not leverage finding other bugs, or stopping that class of bugs from being written again.

    I think this could be baked into a software house that was cost competitive with others in the market but produced less vulnerable software.

    But I agree that this cheap'n'nasty approach will persist till something goes seriously wrong and several people get hurt or killed. That's pretty much how safety improvements have been made in the transport industry.

  17. John Smith 19 Gold badge
    Unhappy

    For a primer on how thing can get FUBAR here's

    https://www.youtube.com/watch?v=2S0k12uZR14

    About resilience in "high consequence" environments, from server rooms to operating theatres and aircraft cockpits.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019