back to article 'It will go wrong. There's no question of time... on safety or security side'

"Software comes with two unique properties: it's basically impossible to inspect and test, and we don't know the sequencing of instructions at the basic level," Statoil's lead analyst for corporate IT digitalisation, Einar Landre, told today's IoT Tech Expo in London. Giving an engineer's perspective on mission-critical …

Anonymous Coward

"Software comes with two unique properties: it's basically impossible to inspect and test, and we don't know the sequencing of instructions at the basic level," Statoil's lead analyst for corporate IT digitalisation, Einar Landre, told today's IoT Tech Expo in London.

Wrong, wrong and wrong.

It's actually very easy to inspect software. You just open the source code in your favourite editor.

It's also very easy to test software, you run it for a range of carefully chosen inputs and compare the result to the expected outputs.

If you really want to know the instruction sequencing you can disassemble the binary, but in practice this is rarely necessary.

It may be impossible for him because he clearly doesn't have a clue about computers, but actual engineers have been doing all of these things every single day since computers were invented.

5
25

Disassembling the binary only gives you the instruction sequencing if it is single threaded application (and not even that if there are many loops and branches dependent upon external inputs). In a multi-threaded or distributed application there is generally no way to enumerate all possible interleaved sequencing of events, particularly when those events may include hardware operation and error scenarios (such as patterns of packet loss).

Carefully choosing a range of inputs is hard, very hard, how do you determine in a complex system that you have selected enough variation, particularly taking into account all possible fault conditions and event sequencing. In a complex system just because input A gives output B when you ran it in test on your perfect network at 12:32pm doesn't prove that it will give output B on a real laggy network at 12:00am. You can't test all possible scenarios so you have to narrow down the set, based on the understanding of the code and platform and where the risk is perceived to be, of course with any risk based approach sometimes you get unlucky.

If code inspection was easy then this obviously means that code that has been reviewed won't introduce any bugs and work perfectly in the system. Experience shows even with thorough code review and test definition and multi layers of validations bugs still get through to the customer.

The simple fact is most computer software is too complex for a single person to fully maintain a mental model of how it works and all the possible interactions and knock on effects of changes. Modularization and clear specification helps mitigate some of this, though that can actually sometimes be where the security or resilience holes appear.

31
0

Not so simple

Also note such things as branch prediction in modern CPUs, for example.

How can you then predict the exact sequence of instructions from the software alone? The exact sequence can surely only be determined for a defined combination of hardware, software and input data.

That seems to be make certainty exponentially more difficult to obtain.

9
0
Silver badge
Childcatcher

Software testing? We've heard of it.

...how do you determine in a complex system that you have selected enough variation, particularly taking into account all possible fault conditions and event sequencing[?]

Run through common scenarios and tests where you know the results you should get and then use fuzzing to find out if your error handling works? This is a good area in which to employ automation in testing. I agree with the rest of what you had to say, The Mole, and see the current state of affairs as there is not even an attempt at any of this in most software houses and especially not among IoT devs.

4
0
Silver badge

"It's also very easy to test software, you run it for a range of carefully chosen inputs and compare the result to the expected outputs."

And now you know how the software responds to a small range of carefully chosen inputs. Every single software bug or security vulnerability is the result of someone assuming that because the software responds correctly in a limited test, it must therefore be safe and secure (or alternatively, knowing that it isn't perfectly secure but releasing it anyway). The big problem is that no matter how much you expand your list of chosen inputs, there are an infinite number possible so you can never test them all. All you can do is try to think up the sort of things that are likely to cause problems; every bug that makes it into the wild is the result of someone thinking up an input that the developer didn't consider (or again, did consider but didn't bother fixing). Hence - software is impossible to test; no matter how thorough the testing you do, you're trying to prove a negative by searching every part of an infinite space.

11
0
Silver badge

Not that easy

You also need to check what happens with inputs outside of carefully your chosen list.

Reality is not neat, corrupted data, intergference, malicious actors, etc, etc.

For non trivial code sequencing is not easy - e.g. multi threading

If code is simple and crystal clear when looked at in isolation, things can get a lot more complex if multiple instances of this code are running and accessing a shared resource - - race conditions, contention, timeouts, deadlocks, etc.

Once you go beyond trivial code, running petri net analysis soon shows how complex even "simple code" can be.

Then theres all the "code" you do not really control - if you are using anything higher level than assembler then a lot of the produced code is outside of your control (e.g. I may define baud rate, com port etc in my serial port read / write code but when I do a serial port write the nitty gritty is done by code written by someone else)

There's a good reason I'm currently running hundreds of instances of 3 different applications (all being fed with auto generated random test data) that all read from / write to a common database ... and when that's all complete comparing database contents / error logs with expected results based on the auto generated inputs.

Meanwhile I have several instances of test aid application that frequently write locks different parts of the database.

..I'm doing that because I want to check error handling, 2 phase commit etc. all works when there's contention, database access issues.

Everything was hunky dory on nice simple unit tests, but I need to ensure it works when there's nasty real world complexity.

8
0
pdh

> It's actually very easy to inspect software. You just open the source code in your favourite editor.

Read "Reflections on Trusting Trust," by Ken Thompson.

7
0
Silver badge

But also consider this rebuttal: "Countering Trusting Trust through Diverse Double-Compiling" by David Wheeler.

4
2
Silver badge
Headmaster

Sofware is "impossible to inspect and test"?

What utter, utter bollocks.

If you can write it then you can test it.

1
10
Silver badge

> It's also very easy to test software, you run it for a range of carefully chosen inputs

I fear that's the sort of hopelessly naive thinking that helps this industry deliver such unreliable bug ridden security vulnerable crap.

6
0
Silver badge
Devil

Re: Sofware is "impossible to inspect and test"?

"If you can write it then you can test it."

Test it, yes. Test it completely, no, because you can only think of so many ways. There's no way to account for every possibility because you won't be able to even envision all the possibilities. And as they say, they only have to be lucky once...

1
0

Re: Sofware is "impossible to inspect and test"?

.. and because it's your baby, your testing probably won't include the stuff you're not fairly certain it can cope with.

1
0
Silver badge
Headmaster

Re: "no way to account for every possibility"

Yes there is, with clearly delineated types, values and branches.

As the software engineer, it's your job to control the data and create the pathway it takes. If you can't control that process then you haven't created it properly in the first place. You don't need to blacklist an infinite pool of invalid possibilities, you only need to whitelist the valid ones.

The problem is not that software is "impossible to test", it's just that software is increasingly being written by people who are not applying basic engineering principles to the process, using tools that designed primarily for monkeys to make money, not engineers to solve problems.

The statement should be amended to "badly written software is difficult to test", but as it stands it's pure bullshit, just an excuse for sloppy programming.

2
0
Anonymous Coward

Re: "no way to account for every possibility"

"software is increasingly being written by people who are not applying basic engineering principles to the process, using tools that designed primarily for monkeys to make money, not engineers to solve problems."

Hallelujah. Someone else has seen the light. Not that it'll be welcome in many places.

It actually goes a little further than that, in that whole *systems* are being designed by "people who ...".

Often it doesn't really matter much. But spreadsheets are not the same as web apps are not the same as legacy transactional apps (classical banking) which are not the same as realtime transactional apps (ticket booking) which are not the same as Industry 4.0 (arrggghhh - oh for the 20th century days when it was just called Computer Integrated Manufacturing and the like).

Sometimes the differences matter a lot. Some people in charge don't seem to understand that different situtations may need different approaches.

2
0
Silver badge

Waffle

Not at all impressed with this "lecture". A load of intangible waffle.

"We need standards...<waffle garb piffle>..."

He's clearly never heard of IEC-61508, which prescribes an international standard for building, documenting, testing, and proving certifiably safe hardware and software systems.

That's very worrying considering he works for Statoil. Fortunately the Statoil engineers that I consult with daily in Aberdeen *have* heard of it.

Not at all impressed.

5
3
Facepalm

As a colleague once said: "you write in C and debug in assembly".

That held true once when a simple C case structure failed for no apparent reason, but only on hardware. By inspecting the assembler code, it turned out the compiler had made an error, but the simulator had the exact same flaw that cancelled the error out in the simulation.

Just wait until the AIoT ... the artificial intell ... oh, you know what I mean.

7
0
Anonymous Coward

@sealand: common mode errors

"the [toolchain] had made an error, but the simulator had the exact same flaw that cancelled the error out in the simulation."

For the same kind of reason it can help to have very independent people writing the code and writing the test data, too, at least for part of the exercise. Doesn't stop them both making complementary errors - been there seen that got the scars.

3
0
Silver badge

Re: @sealand: common mode errors

"it can help to have very independent people writing the code"

Management really hate very independent people.

6
0
Anonymous Coward

"[...] but the simulator had the exact same flaw that cancelled the error out in the simulation. [...]"

A case of two wrongs do make a right.

Several times I have seen people fix an obvious error in the code - that they just happened to notice in passing. Only when things then started failing did they discover the other bit of code that was complementary. Possibly the origin of the wise advice - "if it ain't broke - don't fix it".

2
0
Boffin

Standards

"We need standards,"

https://xkcd.com/927/

technically beaten to this but I am still posting the XKCD link.

7
0
Anonymous Coward

It is a truism that "If anything can go wrong - it will go wrong - at the worst possible moment".

Not matter how well designed and tested - any system has an assumed set of constraints on factors that determine its behaviour. If any of those factors are outside the expectations then all bets are off.

I spent a career investigating "impossible" problems. Invariably they were due to a factor, possibly outside their control, that people had ignored, discounted, or were totally ignorant about. When the specialists were given a detailed diagnosis of the failure they would then come back and say "in that unexpected case it would fail".

A systems programmer was once very insistent that a mainframe data corruption fault could not be caused by a dropped bit "because the manual says that the memory has a parity check". Only if you understood the logic design of the CPU - did you know that the internal data bus itself did not have parity checks.

7
0
Anonymous Coward

"It is a truism that "If anything can go wrong - it will go wrong - at the worst possible moment"."

Would this be true even of formally proven code? Just saying...

1
1
Silver badge

Formal proofs have their limits too. I have used formal methods to prove algorithms correct, but the correctness proof very often (if not always) has a set of preconditions. If the actual input means the preconditions are violated bets are off. Besides, even if my algorithm is correct, I must then show that my implementation is correct, and that my compiler is correct, and that the CPU is correct (remember the old Pentium bug?). I found (ages ago) that in MS Pascal the statements

current := current^.next^.next;

and the code snippet

current := current^.next;

current := current^.next;

had a very different outcome, even when used (correctly) in a linked list with an even number of nodes. The first version caused a crash of the program, the latter worked flawlessly. Both are formally correct, but the compiler apparently didn't handle the double indirection correctly.

This is not to slag off formal proofs, just to say they are not the full answer

9
0
Anonymous Coward

"Would this be true even of formally proven code?"

Yes - because it would probably be running on hardware whose design had not been formally proven.

Even if the hardware was also formally proven - there would be electrical and environmental constraints outside which the hardware could behave unpredictably.

I found two common hardware things that many people were ignorant about.

One was the effect of random particle emissions on memory chips - particularly DRAM. These particles could be externally cosmic in origin - or inherent in the chips packaging material. It was counter-intuitive that the higher spec ceramic packages were potentially a richer source of such particles.

The second thing was logic gate metastability. This is where an asynchronous level change violates the set up timing of a logic input. This can cause the gate to take much longer to switch levels - by a very indeterminate amount.

5
0
Silver badge

Fair enough. I had been thinking about seL4 at the time and knew its formal proof was on the precondition that there was no direct memory access (a common and useful efficiency booster), but I had to wonder if it was possible for a formal proof to be able to cover all cases, but the above shows there's always a way in for Murphy.

1
0
Gold badge

"The second thing was logic gate metastability."

Noted by an Ivor Catt in 1968

2
0
Silver badge

"Would this be true even of formally proven code?" In functional languages functions are typically split into groups: pure and impure. The pure functions, in principle, can be formally proven to work correctly every time. The impure functions such as I/O or database queries can not be given such a guarantee because one can not guarantee inputs will be what you expect or accounted for. So even using a functional techniques at best only partially mitigates the problem. In imperative languages where the program state is much more mutable the problem is even worse (did one unintentionally change a more global variable for example).

The point being made is testing will not cover all the potential cases - they are often infinite - but only those that experience and judgement deems the most important.

2
0
Anonymous Coward

Re: "The second thing was logic gate metastability."

[Another ex-reader of Wireless World, by any chance?]

I think Catt (and maybe Walton and Davidson too) saw more than just gate mestastability. They saw that digital design engineers need to have a grounding in RF electronics.

http://www.ivorcatt.org/digital-hardware-design.htm (The book in question?)

That's even more relevant today (in the days of magic miracle HDMI cables) as it was back in the days of 2708 etc EPROMs. Back in the 2708 era, one exercise I saw cost the company a fortune because the electronics people designing an EPROM programmer for boards with soldered-in chips didn't realise that if they drove the "high speed" PROM programming voltage down a wire wrap backplane, it had to be treated as a transmission line and terminated accordingly.

If you just wire chips together, as a young digital engineer would back then, unfortunate reflections would occasionally occur on the promming voltage driving it way over the design limits, and not only damage the 2708 being programmed, but much of the system around it. Which got expensive, especially in mil-spec kit.

4
0
Silver badge

"... the above shows there's always a way in for Murphy."

Especially if he has help from Gödel.

1
0
Silver badge

Yup. Add Heisenberg and Schrödinger to the team and we'll start to get an inkling of what's really going on.

0
0

" it's basically impossible to inspect and test,"

I agree. Parts can be tested under test conditions but that does not mean all instances can be tested completely under all real life conditions.

BUT, even if software could be tested completely, there is the fuzziness of time. After all the testing has been done and passed, some annoying engineer (me) comes along and makes a hardware upgrade so trivial that it does not require comprehensive system retesting before it is allowed to crash to whole system. That would be annoying with ATMs but more concerning with airliners.

What makes humans needed when it comes to safety critical systems? We are more likely to make small mistakes but we tend to have a "don't be so bloody stupid" checking loop that is constantly going through our minds. Perhaps what we need is not to fruitlessly pursue absolute software testing but rather to implement completely separate DBSBS processes that are not integrated with the checked system. Maybe two levels of DBSBS like a co-pilot cross checking the main pilot's setting of the autopilot.

It's probably already being done but is far to expensive and complex for IoT. You don't expect a £1 watch from Poundland to keep good time, why expect a cheap IoT webcam to not launch a DDoS attack on the Pentagon?

2
0
Anonymous Coward

"Perhaps what we need is not to fruitlessly pursue absolute software testing but rather to implement completely separate DBSBS processes that are not integrated with the checked system. Maybe two levels of DBSBS like a co-pilot cross checking the main pilot's setting of the autopilot."

Not even that's going to help because then you'll run into things like common mode failure (or to use your analogy, the pilot and copilot make the same mistake--or worse, are in cahoots) that hit all the redundancies at once.

2
0
Anonymous Coward

"You don't expect a £1 watch from Poundland to keep good time, [...]"

If the tolerances just happen to be in harmony then it could keep very good time. I have a cheap plastic Accutime bedside clock that runs on one AA battery for several years. It never needs adjustment between battery changes.

In the 1970/80s you could buy a high end camera maker's lens that guaranteed a good spec - or buy a much cheaper lens from people like Vivitar. If you were lucky the latter could be as good as the high end one.

In appearance they were identical - and in fact they were from the same manufacturing line. The high-end camera makers merely selected the ones with the best test performance. In the same way you can get a very good car engine by carefully matching the components for ordinary car engines.

4
0
Anonymous Coward

@various

So many posts, and some interesting insights emerging. I know not of the IEC standards (which is a failure on my part) but then nor did most of my colleagues in the safety critical world of DO254 and DO178 (aviation), so I'll stick to aerospace.

"The big problem is that no matter how much you expand your list of chosen inputs, there are an infinite number possible so you can never test them all."

Not infinite, but quite possibly inconveniently large (especially from the point of view of a bean counter). What can we do to address that?

Well. choosing your test input values carefully may help. What does that mean?

Random (Monte Carlo) style choice of inputs is an option already mentioned, but the space of test inputs gets big quite rapidly, especially if time dependencies come into the picture. And you still have no idea whether the important cases have been covered.

Time dependencies (including things like data stored from a previous iteration) are quite inconvenient from a testing point of view.

One can look at the code's decision points and choose values appropriate for testing the various possible outcomes of the various decisions. This can be done manually or with tools. In suitable combinations, preferably. It's usually a big number of combinations and a lot of testing but it's a lot less than infinite.

When looking at decision points, does one look at the unmodified uninstrumented source code, or the binary end result that's actually been generated by the tool chain? Knowing that they match is another question entirely - sadly some test tools deliberately make their own (unnecessary and irrelevant) modifications to the program being tested, therefore making it impossible to know that what's been tested matches what's been shipped, either in source or binary forms.

That said, looking at source is relatively simple and relatively compatible with bean counters (and also frequently suits tool vendors), but when push comes to shove, processors don't execute source [1], they execute binaries.

It is in principle possible to do static analysis of some classes of binary, spotting the decision points (ignoring most of the rest of the code) and generating test inputs accordingly. You don't need a simulator of the whole processor for this, and it may be substantially quicker than some other options.

Something similar is possible by running the code in a more comprehensive simulator/emulator and tracing the decision points, but the requirement for a more comprehensive simulator is sometimes inconvenient.

These two approaches both rely on the fidelity of the analysis/emulation/trace tools. You can eliminate the dependence on tools by running in real systems, but in the embedded market, the real systems tend to be sufficiently slow that testing becomes a proper chore, even if it's offshored to a place where time is very little money.

A wise mix of the the various options would often make sense from an engineering point of view but the additional time and cost has often been unacceptable to Management I have known.

Did anybody mention that time-dependent effects at run-time are a challenge?

None of which is a replacement for having clued up design and verification teams in the first instance, but such teams tend to be inconveniently expensive, and can risk delaying the project (and expanding the budget) beyond what Management have promised.

[1] A username round here mentions Forth. Maybe someone should look at something FORTH-like as a language for high criticality software. Simple, compact, maybe even safe (albeit quite possibly impenetrable to the average contractor and bean counter). Arguably doesn't even need a trustworthy compiler, certainly doesn't need a complex untestable unprovable compiler.

"software is impossible to test; no matter how thorough the testing you do, you're trying to prove a negative by searching every part of an infinite space."

How about "testing cannot demonstrate the absence of errors, but it can show when they are present.". NB not just software testing, but software does make life particularly tricky, especially as the failure modes of a software-based system are quite hard to predict.

7
1
Silver badge

Re: @various

"Arguably doesn't even need a trustworthy compiler, certainly doesn't need a complex untestable unprovable compiler."

Unless the H/W executes Forth directly (in which case you simply push the problem down a level) it needs an interpreter. What's the interpreter written in and how's it compiled if it's in a higher level language?

However, if the code inspection reveals a hard-coded root password you can stop right there and throw the whole lot out.

2
0
Anonymous Coward

Re: @various

"[...] but when push comes to shove, processors don't execute source [1], they execute binaries."

I have a C program which has evolved over nearly 30 years through various 8/16/32 bit compilers. The current version is stuck on MS Visual Studio 6, It compiles with later MS C compilers - but even with all optimisations switched off it just won't run properly.

The last gotcha from a compiler update was when it assumed that identical code inside differently named functions meant that they could all be optimised to one instance - with a single undifferentiated entry point. They had different names for the reason that their calls were differentiated elsewhere.

4
0
Gold badge
Go

"Unless the H/W executes Forth directly ("

Actually Rockwell Collins did (and for all I know still does) runs it's avionics software on a proprietary stack machine.

I've seen sample jet engine control software written in what looks like a stack based language. Might have been Forth, but the Forth philosophy is you extend Forth into a a task specific language and program in that.

3
0
Anonymous Coward

@Doctor Syntax

Careful with your wording. Pretty sure "throw the whole lot out" gets translated into "ship it" at some business layer.

3
0
Anonymous Coward

Re: "jet engine control software ... stack based language."

"jet engine control software written in what looks like a stack based language."

Maybe you're thinking of the LUCOL language, which wasn't stack based but could easily look like it was?

There is finally an online version of a ten-page early 1980s paper written by the original designers of the language (at Lucas Aerospace) and published by the American Society of Mechanical Engineers:

http://journals.asmedigitalcollection.asme.org/data/Conferences/ASMEP/83943/V005T14A006-82-GT-251.pdf

Other than that there's remarkably little written about it even though it was in so many Rolls Royce engines that RR eventually bought the relevant bits of Lucas Aerospace.

Still flying, still being updated, on some older RR engines and maybe elsewhere. New-from-scratch stuff tends to be hand-crated Ada or even autogenerated from model based system engineering models.

Those who are familiar with some of the better PLC programming languages (beyond ladder logic) may recognise some of the concepts even if the terminology and process is different.

Paper abstract:

"An Approach to Software for High Integrity Applications

W.C.Dolman J.P.Parkes

This paper outlines one approach taken in designing a software system for the production of high quality software for use in gas turbine control applications.

Central to the approach is a special control language with its inherent features of visibility, reliability and testability, leading to a software system which can be applied to applications in which the integrity of the units is of prime importance.

The structure of the language is described together with the method of application in the field of aircraft gas turbine control. The provision of documentation automatically is an integral part of the system together with the testing procedures and test documentation. A description of how these features are combined into the total software system is also given."

1
0

Re: @various

If you're not running on Forth hardware (and that tends to be simple) then the interpreter has some assembly language of whatever you are running it on.

The compiler can be as simple as:

Do I know this word?

If yes, either execute it (if it's 'immediate') or make a link to it for execution by the interpreter. Done.

Is it a number?

If yes, compile code to push it onto the stack. Done.

Stop and complain. Done.

2
0

His "current day" argument is stuck in the 80s.

Lost me at "you can't test it."

2
3
Anonymous Coward

When the company started building 3rd generation mainframes - they had a department dedicated to writing the hardware test programs. The programmers looked at the hardware designs and wrote the tests according to the perceived limit conditions.

However their test suites were behind schedule when the first prototypes were ready to be commissioned. So the commissioning team quickly knocked out their own tests - mostly using random data rather than determining the limiting conditions. They also wrote new specific tests when a fault was diagnosed by sheer hard graft.

Eventually it was found that these skunk works tests were far better at finding problems in machines that had not yet been fully commissioned.

2
0
Gold badge
WTF?

You'd never think the Shuttle flew for 30 years without a flight bug.

Which given that there was no manual control system (The computers and/or the APU fail you bail out or you die) and the design is too unstable for a human pilot to input control movements fast and accurately enough was just as well.

How you do it.

1)Design the code.Break it up into segements. Design the detailed equations it implements

2) SCC system tracked the history of everything. Code, scripts, test data sets on a line by line basis. Cutting edge in 1974, SOP today.

3)Structured walk throughs and documented fixing. Must be done in an impersonal way. It's a bug hunt, not a witch hunt. :-(

4)When a bug gets through understand why and search for that code pattern, then add that to your list of standard patterns to avoid.

BTW Shuttle was written in HAL, a high level language. A lot of the bugs in the early days turned out to be when people skipped this process and directly patched the code in assembler.

Test data generation for code coverage is not inn fact a black art. The books by Glenford Myers (who worked for IBM in the 70s and 80s) explain the process quite well.

If you want do to this today in the UK call Altran Praxis who will do this in SPARC, a safety critical version of Ada. but it's true multi tasking remains problematical. they will do the same but also using theorem provers and the Z language.

What I'll note is this is more expensive that regular code but not a lot more expensive.

And fixing IoS failures promises to be much more expensive due to the large deployments.

Incidentally the F35 ALIS logistics system is not written in Ada. It's written in C/C++. LM said getting Ada programmers was too expensive.

I wonder how much of the ALIS issues writing in Ada would have prevented? Obviously not a problem for LM. It's the US taxpayers who pick up the bill, and will continue to do so as the F35 is now to coin a phrase "too big to cancel."

I think you're lecturer is a bit behind the times. But the question is will companies pay to use these techniques?

3
0
Anonymous Coward

Re: You'd never think the Shuttle flew for 30 years without a flight bug.

"1)Design the code.Break it up into segements. Design the detailed equations it implements"

But then how do you deal with gestalt problems like race conditions that never appear in the individual components but only in the whole (thus gestalt: worse than the sum of the components) and even then only under certain edge case conditions?

1
0
Gold badge

"But then how do you deal with gestalt problems like race conditions "

Well I'd start by calling them race conditions.

The Shuttle computers used a set of sync codes which each received from the others at minimum intervals. Failure to do so suggested something had gone wrong. Watchdog timers help internally. .

You'll also need to check for each unit what resources it needs a lock on and find when 2 or more modules want the same resource and under what circumstances 1 won't release it. The deadly embrace has been know since the 1960s, as has ways to identify and prevent it.

2
0
Anonymous Coward

Re: race conditions

"how do you deal with problems like race conditions that never appear in the individual components but only in the whole"

Well, in hardware you'd perhaps start by looking for race conditions **in the design**.

For the sake of simplicity, let's assume binary logic inputs and outputs. A race condition may exist on the path between a particular input I1 and a particular output O1 (which depends directly or indirectly on I1) if there are two or more differing paths between I1 and O1, such that any given change of state in I1 causes the two (or more) paths to produce *different* (conflicting) changes in the ultimate output, O1. If no such conflict is possible, as shown by analysis or testing, there are no race conditions.

In the absence of such conflicting paths, there is no race condition, surely?

In the absence of appropriate analysis and testing of the software (as designed and as implemented), there may or may not be a race condition, surely?

1
0
Anonymous Coward

Re: race conditions

"In the absence of appropriate analysis and testing of the software (as designed and as implemented), there may or may not be a race condition, surely?"

Nope, because no one really sees the whole thing. Not even the design. That's why I call them gestalts. Each individual component seems all hunky-dory, but no one sees the entire thing nor how each part interacts as part of the whole. That's why you end up with things like perfectly-tested code snippets behaving badly as a whole, because no one can really see the whole. Plus what if conditions alter just so, like a processor that does certain operations faster in some, slower in others, creating a "falling through the cracks" problem?

1
0
Anonymous Coward

Re: race conditions

"what if conditions alter just so, like a processor that does certain operations faster in some, slower in others, creating a "falling through the cracks" problem?"

That might be a very fair (and interesting) question.

If you consider the time dependence to be a risk, either you design it out, or you mitigate it some other way. Keep It Simple, Surely? If the system designers can't see a time dependence, in a real time system, then we're all f*cked either way.

Case in point: a DIY ground-up (VHDL) implementation of a bi-directional interface between a device and memory system, where the device can run faster than memory can (think of it as small fast cache<->big slow main memory if you will).

That approach clearly has the potential to induce timing related issues (amongst others). This was in a system where in principle the timing (and the analysability of timing) were critical to product safety.

Engineer: "Where's the analysis that says there's no safety issue with your proposed design, e.g. with predictability of timing?"

PHB: "Why would there be? Everyone else does it this way, with cache." (paraphrased)

Engineer: "In the COTS market they do, yes. Does everyone else in the real time safety critical business do it this way?"

PHB: [pained grimace; new topic]

Same design also had issues with its memory error correction system, which as originally proposed would, under certain relatively infrequent but almost inevitable circumstances, have led to stale data being undetectably written to main memory after a correctable memory error had been detected.

I don't know if either issue was properly resolved.

1
0
Gold badge
FAIL

"That's why you end up with things like perfectly-tested code snippets"

In this context if that's all you're testing the fail is baked in.

For starters inside the computer if you're looking at hard real time you get into what else is running on the PC, what's got priority, what can generate interrupts etc. Note that straightforward well factored code is likely easier for a compiler to optimize and remember that code can be made as fast as you like, provided it doesn't have to give the correct answer.

Actual control systems engineers either do detailed simulations which include detailed plant dynamics IE actual valve opening/closing times, compression effects on pumping aerated fluids etc or they build hardware-in-the-loop versions of the actual hardware. In the aircraft industry referred to as an "iron bird."

You can see a minimal example of this in the film "Zero Days" where some of the people who found Stuxnet built a little hardware model to explain what it does.

What you've described is like learning to drive by playing Audiosurf. :-(

1
0
Silver badge
Unhappy

Even when everything else is proven

There's no accounting for a 3 phase supply for an entire factory complex hidden behind a thin partition where you are instructed to mount the main control panel.

I know of this by direct personal experience. We only got a clue as to the source of the problem when one of our engineers complained his laptop kept locking up when he was trying to fault-find there.

5
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2018