Separating Separation Logic - Modular Verification of a Pointer Implementation of Red-Black Trees
This page gives a description of the modular verification
of red-black trees with KIV as described in [Rbtree22]. The version verified
uses a pointer structure with parent pointers
and a while-loop based algorithm, it is very close
to [Cormen09], the version in the stdc++-library, and many C implementations that
can be found on the internet. There is one small difference that exists with many
other implementations: we do not use a single pre-allocated
sentinel-node for leaves (that has all the fields of regular nodes),
but simply the null pointer. Therefore, leaf nodes do no have a parent pointer.
Using the null pointer does not require any changes for the insert algorithm.
However, removeFixup cannot start with the replacement node of the deleted node,
since this could be a leaf and traversing upwards would be impossible. Instead it starts
with its parent. and gets the additional (boolean) information of whether the left or right child
has been deleted. Starting this way allows to avoid certain calls to removeFixup at all,
and simplifies its loop test (as a leaf would be possible in the original algorithm in the first iteration only).
KIV's code generator produces the C code contained in this file.
All code in subfolders except string/to_string.c is generated, the main algorithms are in asm/rbtree.c.
The toplevel contains a Makefile for building the generated C code.
The archive also contains a recursive (non-verified) implementation (asm/rrbtree.c) that was generated from KIV models realizing the algorithm used by [Zhan18] to evaluate the Auto2 prover (our implementation is an adaption of the imperative version published in the Isabelle AFP).
Making will create a program 'bench' from 'main.cpp' that can be run to execute some performance tests, which show that the KIV code has run times very similar to the C++ code (and runs faster than the recursive implementation).
The general approach for splitting up the correctness
proof into two parts is described in
[Rbtree22] (submitted; a copy can be obtained by mailing the authors)
and it is assumed the reader has already understood the basic idea.
Here we give a more technical description of the specifications:
- Check the overview
over the specifications first. From there you
can also find links to the theorem base of each
specification, and to all proofs.
-
The basic algebraic specifications used are:
- The specification of
algebraic red-black trees.
- The specification of the enumeration type
lrdescendant
which contains the constants LEFT and RIGHT.
Lists over this type describe a path through a red-black tree.
- The specification of
red-black nodes as
nodes stored in a heap (a map from allocated references to nodes).
- The specifications contain four abstract state machines (ASM's),
that specify abstract data types in the sense of data refinement using
programs for the operations.
- an abstract set of totally
ordered elements that is the top-level
specification.
- the implementation of these sets over red-black trees.
This specification contains the main implementation
of the algorithms
to insert, remove or lookup an elements.
The ASM itself has an empty state. It delegates
all primitive operations (that roughly correspond
to assignments) to calling operations of
- the basic submachine.
This machine stores a state constisting
of an algebraic red-black tree and two paths.
It exports simple operations to modify (e.g. rotate, update color)
the tree at one of the two given paths, or to select
the color or element at one of them.
It also offers operations to reset, extend
or shrink the two paths (thereby traversing the tree).
- the implementation of the submachine .
This machine implements the tree by a heap structure of nodes, and
the two paths by two pointers into the tree. Extending
or shrinking a path is implemented by traversing
to one of the childs and to the parent, respectively.
Rotation becomes rotating the pointer structure.
- For verification, specifications
rbtree-functions
and rbtree-ord-functions
define various functions and predicates over algebraic trees
and paths (only the second specification assumes a total order
for elements).
- Predicates "p ∈ rbt", "p ∈l rbt", "p ∈w rbt" check whether a
path p points
to proper node, to a leaf, or to one of those in tree rbt.
- "e ∈ rbt" checks whether tree rbt stores element e,
and "elems(rbt)" computes the set of its elements. The latter is also
the abstraction function to ordered sets.
- "redCorrect" specifies that there are no two successive red nodes.
Its second argument is the color of the parent node (RED for the full tree,
to enforce a black root).
- "sameBlacks" asserts that all branches have the same number
of black nodes. The definition states this, by ensuring that
the left and right subtree of every node
have the same number of black nodes in their leftmost branch
(counted by "countBlacks").
- The two predicates are combined to "isRbtree" which
specifies correct coloring of a red-black tree.
- "isOrdered" specifies that the nodes are ordered.
This is axiomatised by flattening the tree into a list
("toList") and then checking that the result is stricly
ordered ("ordered<").
The predicate is kept separate from "isRbtree"
since it is essentially independent of the other two.
The two fixup routines do not need a weaker version
of "isOrdered".
- Together the two predicates isRbtree and isOrdered
characterise a correct Red-Black Tree.
- There are two variants of the predicates that hold
after insering/deleting an element from a tree
(but before calling fixup). These assert that
the tree is almost a red-black tree, with the exception
of one path (where the insert/delete happened).
These predicates are called "isRbtreeI", "isRbtreeD"
with sub-predicates "redCorrectI", "sameBlacksD" etc.
These predicates are also relevant as invariants
of the loops in the two fixup routines.
Most of the definitions are structurally recursive over either trees or paths.
It is possible to generate code for those too (and we did so when testing with generated Scala code), but since
the functions are not called by the algorithms the code is not included in the C-code binary.
- The two
theorem bases
of the specifications contains lots of inductively proved theorems over
these predicates, that assert that certain modifications
preserve isRbtreeI or isRbtreeD (or some of its sub-predicates)
or lead back to isRbtree. These correspond to specific cases
of the fixup algorithms.
- Finally there are the two specifications
rbtree-refine and
rbtheap-refine.
These define the abstraction functions. Their
two
theorem bases.
contain the proof obligations for the refinements and their proofs.
- The main proof effort of the whole project
(at least 3/4 of the full effort) is
in the verification of the algorithms for insert and
remove on the algebraic level. Since the abstraction
function is very simple, the main effort is already
in the proofs of thw two constracts
inv-insert
and
inv-remove
which are in the
theorembase
of rbtree-asm.
These assert that "isRbtree" and "isOrdered" are invariant,
and that the elements in the tree change in the obvious way.
The proofs are broken down along the structure of
the subroutines we defined for each program.
The subroutines were defined, to keep proof sizes manageable,
and there are corresponding lemmas "inv-insertFixup",
inv-insertFixup4L", "inv-removeFixup1L" etc. for each of them.
- The proofs of the second refinement are much simpler.
Only these proofs have to use Separation Logic.
The tricky bit here is the abstraction function to separation
logic defined in specification
abs-rbtheap-rbtree.
This defines a predicate "abssep(ref, pref, rbt)(h)"
that characterizes a heap h that stores nodes corresponding
to a tree rbt under a root reference "ref", where
the parent reference of the root node is "pref"
(null for the whole tree; in [Rbtree22], the predicate is just named "abs").
It also defines a function "h[ref,p]" that computes
the reference reached from root reference "ref"
when following path "p" in the heap "h".
Note that we use ordinary (not separating) conjunction in the predicate.
Using separating conjunction would have required to define a single abstraction predicate
that uses the two paths and references as additional arguments. Since the paths/references
get dropped as soon as the traversed path is left, several variants of the predicate
would have been required (for abssep as well as for abspath defined below).
- Since the operations work at a path p
anywhere in the tree, crucial steps
needed in the verification
do not just have to unfold/fold the recursive
definition of "abssep", a crucial lemma that is often
used manually is
abssep-abspath-split.
It allows to split the heap into
a part above path p (characterized by "abspath"),
and a path below path p (characterized by "abssep"
again; in contrast to abssep for the full tree,
this has a non-null parent reference, when the path is non-empty).
- the theorem base defines
rewrite rules that allow to propagate
"abssep", "abspath", "h[ref,p]", when
the heap changes by assignment, allocation or
deallocation.
- For the
refinement proofs of the second refinement,
the most complex ones are those for rotation.
The difficulty is not in showing that the tree changes
in the same way as the pointer manipulation,
but mainly in establishing where the
references move (which is different
if the rotation is at one of the references, or
at the grandparent of such a reference; note that there
are different changes to paths in rbtbasic-asm for rotateLeft
and rotateLeftGP therefore as well). Also whether the
root reference is modified, is relevant.
- To minimize the proof effort, we therefore defined
two generic lemmas for rotateLeftPath-lem
rotateRightPath-lem that specify
precisely, how pointers are changed.
- One other proof that is tricky, is the one for
replacing a tree with its right (and only) child,
since it is called with two different preconditions.
We had to generalize
the proof obligation of refinement to allow a proof, that
symbolically executes the program once,
instead of having to do an initial case split,
and then two very similar proofs that each do symbolic execution.
The effort to do the case study was ca. 4 person months
for two students that had participated in our course on KIV.
A first student needed 2 months to set up the specifications and
to verify the first refinement except remove.
A second student needed another two months
to verify remove and the second refinement.
Bibliography
[Rbtree22] |
G. Schellhorn, S. Bodenmüller, M. Bitterlich, W. Reif
Separating Separation Logic - Modular Verification of Red-Black Trees
accepted at VSTTE 2022
|
[Cormen09] |
T. Cormen, C. Leiserson, R. Rivest, C. Stein
Introduction to Algorrithms, Third Edition, MIT 2009
|
[Zhan18] |
B. Zhan
Efficient Verification of Imperative Programs Using Auto2
Proc. of TACAS 2018
|
Used basic libraries:
Back to the overview of all KIV projects.
For general informations on KIV and a download site for the whole system refer to this page.
[Imprint] [Data Privacy Statement]