Rice's Theorem and Software Failures
Mental Models of Software
My corner of the industry spends a lot of time worrying about effective ways to understand software behaviour, especially in the context of distributed systems. One important sociological disjunction we see when we try to improve our understanding is the disjunction between - I put it loosely - management and workers (in particular, incident responders) and what they see as the underlying basis for software failure.
Again, this is approximate, but let me characterise two views of the world.
In view A, software runs stably unless acted on by an external force, and that force is overwhelmingly human-originated. Therefore, when something goes wrong with the software, it’s overwhelmingly likely to be a human who did something wrong. Restoration of service is a task of reverting the human-originated change, and prevention of error is closely tied with preventing humans doing “bad things”. (There are many possible critiques of this view; I merely relate the core proposition.)
In view B, software is capricious and unstable-by-default, the environment is constantly changing, and if there is a failure, it can be the result of the combination of a collection of side-effects from a thousand different actions that are causally very distant from each other. It is therefore not useful to say a human is at fault. (Critiques of this view are also available: for example, the notion that for deterministic software, at some root of some chain must be a human action.)
It is important to say that these views are not pure logical opposites, and could both plausibly be claimed to be wrong, but are distinct models about the operation of software that drive a view on how to attribute sources of failure in software (and how to prevent same, restore service, etc). They are roughly analogous to Dekker's Old View versus New View on human error.
For completeness, we would stereotypically say that management holds view A and the workers hold view B.
Rice’s Theorem
Enter computational theory.
Rice’s Theorem is related to the Halting Problem, which we won’t itself specifically discuss here, except to say that it is hugely important and generally well covered in undergraduate Computer Science.
Rice’s Theorem, a generalization of the Halting Problem, states that all non-trivial semantic properties of programs are undecidable.
Non-trivial in this case has a narrow, technical meaning: for something to be trivial (note the logical inversion) it means it is true for all programs or for no programs. Therefore, for something to be non-trivial, it means it is true for some subset of programs, and not for the complement set.
Semantic means that it is about a program’s meaning or behaviour, rather than its syntax. For example, the property halts for all even inputs is a semantic property, but program text contains
while (x != true) { continue; }
is a syntactic property.
Undecidable has a number of definitions available, but for the purposes of this article we would say “cannot be correctly evaluated to be either true or false by a program”. We note here that undecidability doesn’t mean that the proposition is neither true nor false - just that it cannot be established to be so by the operation of some definite algorithm.
There are an uncountably infinite number of things which are undecidable. Some famous examples amongst mathematicians would be the continuum hypothesis and the axiom of choice. But it turns out that almost everything useful that you’d want to know about software is also undecidable. Wikipedia outlines the following propositions (with quoted preamble):
Given a program P which takes a natural number n and returns a natural number P(n), the following questions are undecidable:
- Does P terminate on a given n?
- Does P terminate on 0?
- Does P terminate on all n?
- Does P terminate and return 0 on every input?
- Does P terminate and return 0 on some input?
- Does P terminate and return the same value for all inputs?
- Is P equivalent to a given program Q?
Relevance to software failures
It seems clear that we can extend this fairly naturally to cover the software we work with today, and therefore the relevance to software failures (and therefore incidents) is direct. To wit, we cannot know for sure whether or not software is going to be incident-free. It might well be, but we can’t ever prove it.
(The analogy I like, though I know of no theoretical basis for it, is that sufficiently complex systems appear to behave more like a “generative grammar” for incidents, producing novel problems from combinations of situations in the absolutely vast potential state space of software.)
Of course, undecidability results are satisfyingly broad from a rhetorical point of view - they state that literally no program/algorithm can do X - but unsatisfyingly shallow from a behavioural description point of view.
You can’t prove a program halts, sure, but you can make a lot of progress in reasoning about program behaviour outside of halting. (Fascinatingly, Rice’s theorem suggests that the real hope here is in syntactical models for this, rather than semantic ones.)
Relevance to mental models
But it’s precisely that combination of broad and shallow which is useful here.
The argument uses a similar inversion to the Halting problem itself.
Let us suppose that there was a way to prove that a program would work without errors. Then View A of software would be correct, since by definition, you have proved that the software works for some input X, so if a software failure results, it must be for some reason outside of the combination of code plus input, which is (in View A terms) human action resulting in some change such that the previous proof is no longer valid.
But there isn’t a way to prove a program will work without errors. Therefore View A can’t be correct. Whether or not View B is the correct one instead is unclear. But if View A can’t be adopted, View B is one potential substitute.
Conclusion
We need to help holders of view A understand that it is not true, and that their model that the software is inherently stable is incorrect. To the extent it’s useful, it is therefore possible, and true, to say to leaders “software cannot be shown to be stable, and so it’s safer to assume it isn’t”.