|
Proving Properties of Instruction-Level
Programs
Preface Copyright 1995 Agorics, Inc. While Agorics,
Inc. reserves all copyrights, the information in this document is to be
considered public information and is available for use without restriction.
Agorics disclaims any warranty as to the utility, accuracy or effectiveness of
the information contained in this document and specifically disclaims any
liability for consequential damages that may result directly or indirectly from
use of the information in this document.
DRAFT Many programming languages reduce each module of
program code to a set of assembly language-level instructions (bytecodes or
native assembler instructions). When the time comes to load a given module, its
instructions must be checked to make sure they will not violate certain
properties of the higher-level language, such as memory boundaries. This paper
presents a technique for implementing such load-time checking, as exemplified
by the Bouncer; component of Joule. While Joule applies this bouncing
technique to bytecodes, the mechanism is as applicable to other
software-implemented instruction sets and to native machine instructions. The
technique is equally appropriate for strongly typed languages such as Ada,
Pascal, Modula 3, C, and C++, and for latently typed languages such as
Smalltalk, Scheme, and Joule. The "bouncing" technique described here
is not new, but is rather derived straightforwardly from previous work,
including:
- type state checking, by which a variable's value can be
statically determined as unreferenced and therefore deletable.
- abstract interpretation, in which properties of programs
(deadlock freedom, computational complexity, etc.) are computed by interpreting
instructions using only abstract properties of the actual data values.
- a machine-code verifier created by L. Peter Deutsch for the 940
operating system to verify security properties of instruction-level programs.
This document contains the following sections and
subsections:
1.1. About Modules
1.2. The Bouncer as Load-Time Checker
1.2.1. Bouncer Rules
1.2.2. Bouncer State
1.2.3. Control-Flow Checking
1.2.4. Representation Type Checking
1.3. Summary
1.4. Acknowledgments
1.1 About Modules In this paper, a "module" is the smallest
unit of compiled code that can be loaded into a runtime environment. In the
Joule language, this unit of code corresponds to the Module
form. In Joule, a module of code is reduced to bytecodes by the compiler.
1.2 The Bouncer as Load-Time Checker The Bouncer is a
program that statically checks each module before it is loaded, rejecting any
module that does not pass specific static checks. The name "Bouncer" is
inspired by the doorman of a nightclub who keeps out unsavory characters.
1.2.1 Bouncer Rules The Bouncer maintains internal state
which changes as it processes each instruction. First, it checks each
instruction to make sure the preconditions for that instruction have been
established by prior instructions in the module. Then it alters its state to
reflect the mutating impact of the instruction, and finally tests to make sure
its own invariants are intact. Thus, for each type of instruction, a
set of rules have been specified for comparing the state of the abstract
machine against known requirements. The rules for checking the preconditions of
an instruction are called its "test" rules. The rules for checking the
state-mutating impact of the instruction are called its "mutate" rules.
In this way, the Bouncer protects Joule's memory safety and security. A
module that fails to satisfy the rules is rejected, or "bounced." Acceptable
code is passed by the Bouncer to be interpreted or translated by other
components of the Joule abstract machine. The Bouncer does not attempt
to prove the correctness of every possible correct program, which is
theoretically impossible. Rather, it proves a verifiable subset of programs and
rejects all others. In this respect, it is similar to type checking in
languages such as Pascal. No loss of expressive power results from this
constriction of possibilities, though the set of alternative implementations of
a given solution may be reduced.
1.2.2 Bouncer State The Bouncer state machine has a set of
static registers. These exist at bounce time, not at run time; they represent
the state of what the Bouncer knows about the actions of the program code
before and after each instruction. Some of the static Bouncer registers
are:
- pc--The current value of the Joule abstract machine's
program counter.
numLocals--The number of local variables allocated
in the current activation frame.
writable[numLocals]--An array of booleans
indicating, for each local variable, whether it is legal to assign it.
types[numLocals]--An array of static types,
providing knowledge about the runtime representation type ("uninitialized",
"tagged pointer", "facet pointer", etc.) of each local variable of the current
Activation. The actual local variables are in the array
locals[numLocals], which exist only at run time, not at bounce time.
In these terms, the instruction OP_MOVE (3, 5)
means to move the value in locals[5] into the position locals[3]. It is subject
to this set of rules: Test writable[3] Mutate
types[3] := types[5]; This tests that local variable number three can
be assigned to, and specifies how to modify the typestate used to test the next
instruction. All of the array indexing is implicitly bounds checked, so if
numLocals is less than 6, this load attempt is also bounced.
1.2.3 Control-Flow Checking First, the Bouncer derives a
model of the state of the abstract machine before and after each instruction in
the module being checked that is consistent with all forks and joins called for
by the instructions.
1.2.3.1 Forking A program fork results from the evaluation
of some Boolean expression, which may be true, false, or a pointer to some
other value. The instruction that implements forking,
OP_TEST_IF (cond, onFalse, onFail) falls through to the next
instruction if locals[cond] is primitively true, branches to
onFalse if locals[cond] is primitively false, and branches to
onFail otherwise. In the Bouncer, this means that we propagate the
current Bouncer state to all three locations.
1.2.3.2 Joining Following a fork, it may be necessary to
interpret the re-joining of the two execution paths into a single consistent
state.
Figure 1: Joining -- L and R represent the two
(possibly inconsistent) prior states of the Bouncer; J represents the
resulting state. Treating this join as if it were an instruction, the
Bouncer applies these rules (where the subscripts L and R
designate the static values in each of the two prior states):
Test numLocalsL== numLocalsR
Mutate for (i = 0; i < numLocals; i++) {
if (writableL[i] == writableR[i]) {
writableJ[i] := writableL[i];
} else {
writableJ[i] := FALSE;
}
if (typesL[i] == typesR[i]) {
typesJ[i] := typesL[i];
} else {
typesJ[i] := VOID;
}
} That is, if the two prior states agree on the type of a slot,
the slot is considered to be of that type in the state representation J;
if not, the slot is considered uninitialized, and cannot be read by code
following the join until it is assigned.
1.2.3.3 Looping If any loops occur in the module, the
instructions within the loop again need to be checked against the Bouncer state
as it exists at the time of the backward jump. If multiple possible states loop
back to the same point, they are treated as joins. The proliferation
of states resulting from multiple forks and joins is controlled because
inconsistent slots collapse to "uninitialized" when joined. This means that, on
any iteration through the module, there will be strictly fewer joins required
than on the previous iteration, because any inconsistencies previously
encountered have collapsed the affected slots to "uninitialized". The
Bouncer inevitably either rejects the program for being unsafe or converges to
a single consistent model describing the state of the Joule abstract machine
prior to each instruction in the module. This representation must converge to a
fixed-point state of the Bouncer because the collapse of inconsistent slots to
"uninitialized" is monotonic, and the number of slots is finite.
1.2.4 Representation Type Checking There are two Bouncer
phases. After the Bouncer has derived a single consistent state history of the
Joule abstract machine based on the control-flow checking, the Bouncer then
passes through the instructions of the module and applies the test and mutate
rules to each, based on the state arrived at in the control-flow check for each
instruction. If any instruction generates an unacceptable state of the Bouncer
by failing to conform to the rules, the module is rejected and an exception is
signaled.
1.3 Summary The smallest unit of program code that can be
loaded is called a module. Each module consists of a set of assembly-level
instructions, which may be either bytecodes (as in Joule) or native
instructions. When a module is loaded, its instructions must be
checked to make sure they will not violate certain properties of the
higher-level language, such as memory boundaries. This paper presented a
technique for implementing such load-time checking, using the Bouncer component
of the Joule abstract machine as an example. While Joule applies this
bouncing technique to bytecodes, the mechanism is equally applicable to native
assembler instructions. The technique can be extended to verify more
sophisticated properties, such as statically known reference counts, static
array bounds checking, type structure for objects in the heap, and real-time
constraints.
1.4 Acknowledgments Future drafts will include the many
references appropriate to credit the early development of these ideas.
[ Agorics Home Page ] [
Technical Library ] [ Info on Joule ] |