Data Abstraction

Procedural abstraction is nearly as old as when programming languages provided user defined subroutines. The procedure is hidden from its caller while the behavior is accessible to the degree which the caller probes the behavior. Data abstraction is more recent and occurs when the caller passes as an argument, a data structure whose internal form is accessible only to the procedure.

Data abstraction is a platform architecture feature or language feature that limits the code that can directly access the representation of a particular collection of data. The code to which such access is limited is called the custodial code here. Static abstraction is generally (always?) a language feature and has little runtime cost. Perhaps C++ is the best known language providing static abstraction; A reference to an instance of a class may not suffice to access a field in that instance, except by code within the definition of the class. Of course C++ is derived from C which fails to enforce the protection rules that the design of the C & C++ types seem designed to provide.

Dynamic data abstraction is provided today mainly by kernels using hardware features originally designed to allow kernels to protect clients from each other and remain in control when client code misbehaves. With the ideas of Morris languages can also provide dynamic abstraction. Just now I know no languages that support Morris’s ideas. Stiegler’s mechanism serves here in some cases. The Keykos platform uses ideas closely related to those of Morris.

Data abstraction is motivated on several grounds, usually just one at a time:

It improves the integrity of the data because the format of the data will remain correct if the custodial code is correct, even when the data are wrong. Bugs that would corrupt the data format are inexpressible outside the custodial code, or result in early failure more directly tracked to the source of the bug. Together these two result in reduced development costs and better service.
Sometimes the custodial code is a trade secret or otherwise proprietary. Sometimes the service of the custodial code is conditioned on payment. The data may belong to the caller but not the representation of the data. Protecting the data from the proprietor of the custodial code is the confinement problem which we do not further pursue here.
Sometimes the data are proprietary and the custodial code provides incomplete access to the data, at a lesser price. The custodial code is in a position to measure the degree of provided access.

This feature requires formally delimiting the custodial code to the platform and this designation must itself conform to relevant protocols.

Dynamic data abstraction has runtime costs. It is dynamic in two senses:

Code must explicitly execute commands and expend runtime to seal and unseal abstractions.
Abstractions can be created at run time. In Morris’s terminology new sealer-unsealer pairs can be created as the program runs. With static abstraction the analogous actions all happen at compile time.

The Keykos brand provides dynamic abstraction with new abstractions arising at run time, à-la Morris. Some computer languages provide static abstractions fixed at compile time. Some hardware architectures have made dynamic abstraction quite low cost whereas conventional hardware requires a trip thru the privileged code. Integrated Development Environments, to my knowledge, do not support the 2nd purpose of abstraction via language features.

Synergy goes beyond abstraction to protect data from access outside custodial code even when that data is held by agents outside that code. Abstraction goes beyond synergy by providing some verifiable type information.

Nexus on data abstraction