Table of contents
Implementing
Formal definition
Implementation is a relation between a concrete structure and an abstract specification :
where:
- is the abstract specification — a description of a data type or system in terms of abstract values, abstract state, and abstract operations; specifies what the system does, not how it does it; ’s state space may be mathematically idealized (e.g., abstract stacks as mathematical sequences) and its operations may be specified axiomatically or as abstract functions
- is the concrete realization — a data structure, program, or system that provides actual computational representations of ’s abstract values and actual implementations of ’s operations; ’s state space uses concrete representations (arrays, linked lists, hash tables) and its operations are executable procedures
- is the retrieve function (or abstraction function) — a function mapping each concrete state to the abstract state that represents; is the mathematical certificate of implementation: it shows exactly how the concrete representation relates to the abstract specification
Three conditions for to witness a valid implementation:
-
Representation invariant coverage: every state that satisfies the concrete representation invariant has satisfying the abstract invariant . The concrete representation must encode only valid abstract states — no concrete state outside the representation invariant’s scope is required.
-
Operation commutativity: for each abstract operation and its corresponding concrete operation :
Executing the concrete operation and then abstracting (via ) gives the same result as abstracting first and then executing the abstract operation. The concrete operation faithfully implements the abstract one through .
- Initialization: the concrete initial state corresponds to the abstract initial state: where is the abstract specification’s initial state. The implementation starts in the right abstract state.
The retrieve function is not required to be injective: multiple concrete states may represent the same abstract state (e.g., two different memory layouts representing the same list). This is the essential flexibility that makes implementation more than isomorphism — the concrete representation may be more redundant, more detailed, or differently organized than the abstract one.
Hoare: proof of correctness of data representations
C.A.R. Hoare (Proof of Correctness of Data Representations, 1972, Acta Informatica) gave the first formal treatment of data abstraction and implementation:
Hoare’s paper establishes the following framework for proving that a concrete data representation correctly implements an abstract specification:
An abstract type is defined by:
- An abstract state space (e.g., mathematical sequences for a stack)
- Abstract operations with pre/postconditions in terms of (e.g., ; )
- An abstract invariant (conditions that always hold)
A concrete representation is defined by:
- A concrete state space (e.g., arrays with a top pointer for a stack)
- Concrete operations (executable procedures)
- A representation invariant (conditions the array representation always satisfies)
- A retrieve function mapping valid concrete states to abstract states
Proof obligations for the implementation:
- is established by the concrete initialization: the initial concrete state satisfies
- is maintained by each concrete operation: if before, then after
- Each concrete operation correctly implements via : the Hoare triple holds
Hoare’s framework established that correctness of data representations is provable by finding a retrieve function — a mathematical object that witnesses the correspondence between abstract and concrete. This was the foundation of all subsequent work on data abstraction verification.
Liskov and Guttag: abstract data types and abstraction functions
Barbara Liskov and John Guttag (Abstraction and Specification in Program Development, 1986) developed abstract data types (ADTs) as the programming language realization of Hoare’s abstract/concrete distinction:
An ADT specification consists of:
- A type name
- An overview describing the abstract value the type represents
- Operations with pre/postconditions specified in terms of the abstract value
- An abstraction function mapping concrete representations to abstract values
The representation invariant specifies which concrete states are valid representations of abstract values. Only states satisfying are subject to the abstraction function.
The abstraction function is partial: it is defined only on states satisfying . Outside , the concrete state has no abstract meaning — it is an invalid intermediate state. Operations must maintain (implementation invariant) and must commute with (implementation correctness).
Liskov and Guttag introduce benevolent side effects — operations that appear to modify the concrete representation but leave unchanged (e.g., a caching operation). Benevolent side effects are permitted: they change the concrete state while preserving the abstract value. This formalizes the implementation principle that multiple concrete states may represent the same abstract value.
Reynolds and abstraction in type systems
John Reynolds (Types, Abstraction and Parametric Polymorphism, 1983) showed that the retrieve function / abstraction function has a deep connection to the semantics of polymorphic types:
A representation independence theorem: if a module with abstract type uses representation internally and exports via operations with the abstraction function , then any two implementations and with abstraction functions and that satisfy the same abstract specification are related by a logical relation — the relation induced by the abstraction function. No client code can distinguish from , because the type system ensures all client accesses go through the abstract operations.
This is the abstraction theorem for modules: information hiding (encapsulation of the concrete representation) is enforced by the type system, and the retrieve function is the mathematical content of what is being hidden. The module’s abstract interface exposes only ; the concrete representation is inaccessible to clients.
Reynolds’ paper connects data abstraction to relational parametricity: the abstraction function defines a binary relation on values of type , and well-typed programs respect this relation — they cannot distinguish elements of that are -related.
Open questions
- Whether the retrieve function is required to be a surjection (every abstract state has a concrete representation) or only a partial surjection (some abstract states may be unrepresentable), and whether this choice affects the theory of data refinement for non-terminating systems.
- Whether implementation in the sense of the retrieve function corresponds to a morphism of sheaves in — whether the retrieve function is a natural transformation between the presheaves representing and , making implementation a morphism in the category of sheaves.
- Whether Liskov’s behavioral subtyping (the Liskov Substitution Principle) and Hoare’s data refinement are the same concept at different levels — whether behavioral subtyping is implementation relative to the behavioral specification, and whether the retrieve function in data refinement corresponds to the simulation relation in behavioral subtyping.
- Whether benevolent side effects (changing the concrete representation while preserving the abstract value) are the implementation-theoretic analogue of the saturation nucleus’s closure — whether applying the nucleus to a concrete state produces a canonically normal form with the same abstract value, making nucleus application the “canonical retrieve” operation.