Donald MacKenzie’s Mechanizing Proof

This remarkable book covers the history of the notion of formal proof and more recently the notion of computers dealing in proofs. There has been controversy since Aristotle over the rationale for thinking about what it means to be a proof, but when people began to program computers to recognize and find proofs, many thought that we had gone too far.

Very interesting perspectives are described around page 208 in the chapter ‘Social Processes and Category Mistakes’.
Dijkstra says early:

That suggests that only the programmer is concerned about the correctness of his code—how bizarre! Today tens or hundreds of millions of people may be concerned about the correctness of a particular program. Few of them have the ability or time to contemplate the code or proof, let alone understand it, but a shallow hierarchy of proof checkers (human and mechanical) along with an informal reputation system, can provide assurance of correctness that is far more reliable than what we have today.

Computers can be said to understand things that we can’t. Computers know the sum of one million numbers in a snap and we generally take for granted whatever the computer says. There is yet controversy over whether computers can know propositions. They can certainly record them, using our words, but to know them is to deal with relationships between propositions, such as which may be derived from which. Should we believe a proposition because the computer says that it may be derived?


Indeed the conflict has reached high (but not good) politics which is recounted in the book.

Only occasionally does the book observe that checking a proof may be tedious but can be done by a simple program which is an order of magnitude smaller than either the sorts of kernels or proof finders aspired to in the heyday of formal proofs. (He addresses the issue fully on about page 290.) Faith in such results relies on belief that:

That the program is correct may perhaps be seen because it is small and need not be overly optimized for speed and implements a simple conventional definition of what it means to be a proof. Several such programs can be written and so probably avoid common mode failures.

I did not believe that simple definitions of proofs could capture the essence of proof until after a few courses in logic and foundational math at Berkeley. I do believe it now. Few have direct reason to believe that such simple rules capture logic faithfully. I am not sure that all convincing mathematical arguments can be put into this form but I suspect that the desired propositions about computer programs can be. We are left with a priesthood of artisans who can vouch for and attest to such validity.


MacKenzie pits Dijkstra against Perlis et al. I disagree with both. I, with Les Lamport, ‘am of the school that thinks that propositions either do or do not follow from certain axioms’ as certainly as some finite set of tiles do or do not tile a unit square, and that as a practical matter the computer can reliably help us determine some specific cases where they do. I suspect that there are useful propositions whose proofs can be mechanically verified, but so far these are too simple and too few.

There remains the significant problem of understanding what has been proven. The proof is not much good if the proposition cannot be understood or the proposition does not bear on our needs.


I largely agree with Fetzer’s perspective about proofs; but he used ‘certainty’ as a rhetorical cudgel against the formalists. It is true that the formalists promised certainty in software and by extension certainty in system behavior. They should not have done this and Fetzer caught them at it. Fetzer refers to a category error in conflating proofs which are things of the mind, and computers which are things of the world. I do not accept Fetzer’s division of the world into these categories. It is a conventional philosophical division, akin to dualism.

There remains the real practical problem: How many mistakes do computers make, and especially how can the enemy exploit them. This question has not yet been broached in the book, as of the end of chapter 6. In Keykos we attempted to introduce some random errors to see what fraction would impact system integrity. This was an uncertain process but it seemed that only a small fraction of random transient errors led to any external effects.


Page 282; MacKenzie quotes Robert L. Constable: I no longer think that integers exist. 3 is a useful conventional social illusion that helps us survive our environment. The symbols we write, such as “3” are more real, and formal. This logic game sort of resonates with something in our head that our DNA gave us. It seems to work and helps us build air-planes and such. Before we throw out the ‘absurd’ axiom of choice I want to see a theory of reals good enough to get to the highly useful Hilbert space, which now helps us build modern air-planes. We got there by theories that tell us that we can duplicate the solid sphere, but not exactly how. That same theory gave us Lebesgue measure which gave us Hilbert space. That edifice will not be easily replaced.

Page 290:

The checkers that I have outlined would be a few pages of code and far easier to produce than any sponsorship application paperwork that I have ever seen. Perhaps I will have to finish this project. There is one possible problem with my project: It may be necessary to write and run an untrusted program to transform a Nuprl or HOL proof into the strict proof form that I have in mind. Then a truly short program can check the proof. This transformation may well lead to a fatal exponential explosion in size. Adding another page of code to the checker allows proof forms, systematic substitution, that avoid one source of exponential growth. I don’t know whether this is good enough.

See text following “I would believe that” here.

At the end of page 292 MacKenzie describes Boyer’s project for his incomplete project for the verifier to verify itself. This would indeed be useful but then you are left with the significant problem of understanding the specifications of the verifier being proven.

I am amazed at MacKenzie’s report of Boyer’s attitude toward proof checkers. I seems a bit like a gold prospector returning to his grub stake provider and saying “Yes there is gold out there; I left it there; no I did’t keep exact notes as to where it was; maybe I can find it again.”. This is also a bit like the quote from Dijkstra suggesting that finding a proof was to convince one’s self that something was true, rather than convincing someone else that it was true. I think that we evolved a proof notion in order to convince others. Perhaps the proof finder is supposed to have so much credibility that others are convinced by mere claim of having found a proof. I would be less impressed than the original programer’s claim that the program was debugged.

I was not much impressed with the last chapter: Machines, Proofs and Cultures”. It examined the sociology around the problem of reliable software. I sense a duality in many of those that MacKenzie describes; bordering on dualism. If you are into dualism I would suggest that what goes on inside a computer may more properly of the world of the mind than of the physical world. It is not human; it is alien; but it is of the mind!

There is an important point that is made in several points in the book concerning the difficulty of understanding what has been proven. I fear that we will settle for proving what we can assert in some arcane language that only a few can understand. The web based research that I have done seeking the semantics of the languages used in proof technology has yielded nothing. The book is useful in providing grist for Google searches.

My biggest caution is the dfficulty of understanding what has been proven. Too often someone proves an insignificant fact about a program and announces that the program has been ‘proven’. How often do you hear the question “What was proven?”? How often do you hear an answer? How often do you understand the answer? If it was that the “Program matches specifications.” what language were the specs written in? How many people understand those specs?

Anticipating a phase of formal specifications for Keykos we contemplated some of these questions. We did not come up with a completely satisfactory answers but the effort led us to find a couple of high level bugs. Sometime threat of formality improves quality. It probably slows things down too.


I cannot help but compare this book with the marvelous ‘Logicomix’ in their respective recounting of the quest for certainty.
Survey of formal methods