B. D Lubachevsky.

An approach to automating the verification of compact parallel coordination programs. I online

. (page 1 of 5)
Online LibraryB. D LubachevskyAn approach to automating the verification of compact parallel coordination programs. I → online text (page 1 of 5)
Font size
QR-code for this ebook

Computer Science Department




By B. D. Lubachevsky

Computer Science Dept.
New York University
251 Mercer St., N.Y., N.Y. 10012

February, 1983
Technical Report - # 60




Department of Computer Science
Courant Institute of Mathematical Sciences


UCN #45



By B. D. Lubachevsky

Computer Science Dept.
New York University
251 Mercer St., N.Y., N.Y. 10012

February, 1983
Technical Report - # 60

This work was supported in part by the Applied Mathematical
Sciences Program of the U.S. Department of Energy under
Contract No. DE0AC02-76ER03077 , and in part by the National
Science Foundation under Grant No. NSF-MCS79-21258 .

Ultracomputer Note #45 August 1982


B. D. Lubachevsky

Abstrac t - A class of parallel coordination programs for a shared
memory asynchronous parallel processor is considered. They use the
primitive operation Fetch&Add and differ in various respects from those
which use basic P and V operations. The F&A is the basic primitive for
the "NYU-Ultracomputer". A correctness proof for the considered
programs must be done for arbitrary number N of processing elements
since the Ultracomputer design includes thousands of PEs .

A so-called reachability set description (RSD) is introduced, in
which all reachable states of a program (exponential function of N) are
collapsed into a fixed number of "metastates". The transitions between
these "metastates" are also specified. By using such a notation it
becomes feasible to prove the absence of livelock and other temporal
properties of a parallel program. The concept of a compact parallel
program is introduced. Roughly speaking a parallel program executed by
N PEs is compact if there exists a boundary T independent of N such
that any state of this program is reachable within time T. Compactness
may also be understood as the uniform over N finiteness of the
expansion for the strongest invariant in the least fixpoint theory. A
program that builds RSDs for compact parallel programs and examples of
proofs generated by this program are discussed.


Typographical conventions

Due to a limited type font on the device that printed this
manuscript, the following conventions are used:

Symbol in







1 Meaning


1 set intersection

Example of use
A fi B



] set containment

A > B



] negation of
1 set inclusion

A not» B



negation of
set to element

A x\oX» b


1. Introduct ion 5

2. Verif ier-l 1 1

2.1. Ultracomputer and primitive Fetch&Add 12

2.2. "Naive" semaphore 14

2.3. Eeachability graph. Example of the "naive" semaphore 16

2.4. Indivisible operations and the serialization principle 17

2.5. Bose and Boltzmann semantics 19

2.6. Uvelock 2 2

2. 7. Fetch&Funct ion-programs 2 3

2.8. Example of verification of a program with P and V operations 26

2.9. Reachability graph and progress functional 29

2.10. Invariants and the fixed-point theory 33

2.11. Infinite loop property 35

2.12. Eliminating private variables 38

2.13. Indivisibility by virtue of the program 42

3. Verif ier-2 45

3. 1 . Go rrect semaphore 46

3.2. Keachability set description 47

3.3. Proof of correctness of the semaphore program 49

3.4. Comparison with the method of verification by Owicki and Cries. . .53

3.5. Keachability tree and forest 56

3.6. A geometric interpretation of the reachability tree

and the RSD. 59

3. 7, Compactness property 6 3

3.8. An example of a non-conpact F&A-program 65

3.9. Neither a PV-skeleton nor a Petri Net can simulate

a general F&A-skeleton 67

3.10. Classes of parallel programs to which verifier-2

may be applied 70

4. Examples of programs proved correct using verifier-2 77

4. 1 . Readers and writers 77

4.2. Reader^optimized version of "Readers and writers" 80

4.3. Detecting cessation of parallel activity 82

4. 4. "Busy-wait" synchronization 85

1. Intrcduct ion.

The programs developed for the "NYU-Ultracomputer", a shared
memory asynchronous parallel processor [6], are characterized by the
following two properties: 1) they are to be executed by a large system
(the parallel computer might include thousands of processing elements);
2) they use the basic primitive Fetch&Add which enables efficient
parallel programs to be written for such systems [5, 17].

As a price for the efficiency of a F&A-program the possibility of
making a specific time-related mistake in such a program is seemingly
greater when comparing it not only with a serial program (which is not
surprising) but even when comparing it with a conventional PV-program
of the same length. (In a PV-program the only coordination primitives
are P and V operations on semaphores.)

The difference in the complexity of a F&A-program and a PV-program
manifests itself in the fact that the skeleton of a PV-program is
equivalent to both a Petri Net and to a vector-addition system (VAS)
[7, 9], whereas the skeleton of a F&A-program is generally not
equivalent to a Petri Net, and a generalization of the notion of VAS is
required for its description (such a generalization, called controlled
VAS, will be described in part II of this paper).

A skeleton of the "naive" semaphore (see section 2.2) is a
classical example of the complexity of a F&A-program. These 3 lines of
code (each line with one assignment) contain an inadmissible livelock
which is not readily discernible .

We have tried to apply the assertional approach [1, 11, 13, 14] to
automating the debugging and correctness proof for the considered class
of programs. In this approach an assertion about the program state
(augmented by a number of auxiliary variables when necessary) must be
supplied. From this assertion desired properties of the program are
supposed to follow. The predicate that expresses the assertion must be
proven true for the initial state and the true value of this predicate
niist be proven invariant for any program transition. If such a proof
holds then this predicate is true for any reachable state of the
program and it is called an invariant of the program.

The task of proving the invariance of a given assertion does not
present a difficulty for all the considered examples. The task of
supplying the auxiliary variables and of generating the invariant
appears to be more difficult. A method for the invariant generation,

The livelock was not recognized by anyone shown the program for the
first time by the author. This bug was exposed by Dijkstra ([4],
p. 122). Since on basis of this livelock existance primitive F&A was
rejected in [4] as unappropriate to solve the mutual exclusion problem
(swap-operation was suggested instead), Dijkstra seemingly did not
realize the fact that only one extra checking statement eliminates the
livelock (see section 3). All these facts confirm the point that it is
difficult to obtain an "insight" of a F&A-program. Note that this
difficulty is only inherent to coordination routines, written in an
unstructured F&A-style. The latter take a small fraction of an
application code, but an essential part of the operating system.

suggesced in [8] for PV-programs , can be generalized to include
F&A-programs . However, the invariants supplied by this method are not
strong enough to prove, for example, the absence of a deadlock in some
cases [2].

There is yet another difficulty in using the assertional method
for F&A-programs. Namely, this method (at least as presented in
[13, 14]) does not help to prove the absence of blocking for some of
these programs. (See section 3.4 for an example.)

Facing these difficulties the author decided to examine a method
based on an explicit presentation of the graph A of all the program
states and state transitions. Generally large size of A is considered
to be the major obstacle in using this method. The size of A depends
on the length of the analysed program and on the number N of executing
processing elements (PEs). Although the length of the considered
F&A-programs is small, the size of A grows very quickly with N.

It was observed, however, that in most cases it is not essential
which PE executes which statement but rather how many PEs execute each
statement. The consequent symmetrization thus reduces the growth of A
from exponential with respect to N to polynomial with respect to N.

A program, called verifier-1 (see section 2), was created that

capitalizes on these advantages. Verifier-l first builds the

(symmetrized) description of the reachability graph and then answers
various questions about this graph.

The general types of questions which appear useful are:
Is there a reachable state satisfying a given predicate?
Is there a strongly-connected component of reachable states satisfying
a given predicate and a special infinite loop property? (This property
relates to the well-known finite delay and fairness properties. The
discussion follows in section 2.11.)

Verifier-1 builds A for fixed moderate values of N. The

correctness of a property established by verifier-1 does not

automatically mean that this property is true for arbitrary N. (The
case of interest for the Ultracoraputer . )

To verify correctness for arbitrary N another program, called
verifier-2, was created. Verifier-2, if terminated, produces the
so-called reachability set description (RSD) for the analysed parallel
program. In the RSD graph A is represented in a form even more
aggregated than the symmetrized form produced by verifier-1. All
reachable states of the analysed program are collapsed into a fixed
number of the so-called metastates. Each metastate is a set in the
state-space of the program, represented by a conjunction of a fixed
number of inequalities, which are linear with respect to the
state-space coordinates. The transitions of the analysed program are
also represented in an aggregated form as transitions between these


Since all the states represented in the union of these raetastates
are reachable, the RSD serves as a solution to the problem of invariant
generation. Namely, one can view the disjunction of descriptions of
the metastates as the strongest invariant for the analysed program.
Mareover, proving the absence of blocking becomes feasible in the RSD
notation. Namely, from the RSD one can produce a compact description
of all strongly-connected components in A satisfying a given predicate,
then by studying these components one can check if any of them generate
a blocking or not. Examples in sections 3 and 4 demonstrate such a
verification scheme.

A necessary condition for the termination of verifier-2 when
building a RSD for a parallel program is the compactness of this
program. Compactness with respect to the set V„ of initial states of
the program means that there exists a boundary Tq (independent of N)
such that any state is reachable from a state of Vq within time Tq.

Compactness may also be understood in terms of the expansion for
the stroagest invariant in the least fixed-point theory [2]. This
expansion has the form

(1.1) R(Vn) = U TkVn),

" j>0 "

where R(Vq) is the reachability set of the program or else the carrier
of the strongest invariant; F is the progress functional which can be

obtained from the text of the program (see section 2.9)

.,, Df . n Df

Fl+^(x) == F (fJ(x)), f"(x) == X.

Compactness requires that the expansion (1.1) contain a fixed
(finite) number of terms independent of N.

Normal programs, a restricted class of parallel programs (that
still includes many interesting examples) are specified in section 4.
It will be proven in part II that if a program is normal then
conpactness is sufficient for verifier-2 to produce the RSD in a finite
time (if enough memory is available to the verifier).

The example in section 2.8 shows that the verifiers do not only
work for F&A-programs. Programs using P and V operations on semaphores
may be verified as well. However, the following restrictions on the
analysed program must be satisfied even when using the less restrictive
verif iet^l:

(vl) the code of the analysed program must be the same for all N
processing elements (PEs);

(v2) private variables for each PE may take only a fixed number of
different values independent of N;

(v3) there is only a fixed number of pubKc variables each of
which takes a finite number of values (the latter number may depend
on N).

Property (vl) seems to be least restrictive, whereas properties
(v2) and (v3) are restrictive. In particular, they infer finiteness of
the reachability graph for any given N.


Despite these restrictions the author tends to consider these
verifiers to be successful. For some of the programs verified by this
method, no other formal proof technique known to the author seems to

In this, part I of the paper, an introduction to this method is
given. Several examples of its application are demonstrated.

2. Verifier-1.

In this section the theoretical background of the algorithm,
called verifier-1, is described . We first briefly discuss the
Ultracomputer (for a more extended discussion see [6, 18]). Then the
work of verifier-1 is demonstrated in the verification of a simple but
an instructive F&A-prograra. A general class of parallel programs to
which this verifier may be applied is then introduced and an example of
the verification of a non-F&A-program is presented. In discussing
these examples, several notions relevant to verification are
introduced. Only those constructions and notions introduced in
sections 2.5, 2.7, 2.13, and, possibly, 2.12 sean to be novel. The

''Although the temporal logic approach [15, 16, 18] may help in studying
various properties of the considered programs for fixed moderate values
of N, this approach does not address the case in which N is a
parameter. In general going from N = 2 to arbitrary N is non-trivial
for F&A-programs.

-*Verifier-l has been implemented as a FORTRAN program. We will not
discuss aspects of its programming implementation since they are

others are known from elsewhere and a discussion of them is included to
faciliate reading by someone not familiar with the subject.

2.1. Ultracomputer and primitive Fetch&Add .

An ideal parallel processor, dubbed a "paracomputer" by Schwartz
[18], consists of identical PEs sharing a common memory. The

individual PEs may also have attached local memory, which we refer to
as their "private" memories; the memory shared by and common to all
processors is called "public", and variables stored there are called
"public variables". The PEs can simultaneously read any public cell in
one cycle. Moreover, simultaneous writes (including the F&A operation)
are likewise effected in a single cycle and a memory cell to which such
writes are directed will contain some one of the quantities written
into it. Note that simultaneous memory updates are not serialized; in
fact they are accomplished in one cycle.

The programmer may view the Ultracomputer as a paracomputer and we
will treat the Ultracomputer this way in our subsequent consideration .

'^Paraccmputers mist be regarded as idealized computational models since
physical fan-in limitations prevent their realization. Ultracomputer
is a realizable approximation to a paracomputer in which each PE can
directly access its private memory and can access the public memory via
a (multicycle) interconnection network. In this more realistic
architecture a public memory access may require several PE cycles.

The code of an Ultracomputer program looks like one of an ordinary
serial program. Each PE "sees" the public memory and executes the
program code. In this paper only the case when this code is identical
for all tte PEs is considered. The speed of these executions may be
different for different PEs. Fetch&Add is the primitive to coordinate

The format of the F&A operation is F&A(c,i), where c is a variable
and i is an expression. This indivisible operation yields c as its
value and replaces the contents of storage location c by c + i. In
this paper only the case when variable c is pubKc integer and i is an
integer constant is considered.

The following principle of serialization of operations F&A takes

Assume that several (possibly very many) F&A operations
simultaneously address c. Then the effect is as though these operations
occurred in some (unspecified) serial order, i.e. c receives the
appropriate total increment and each operation yields the intermediate
value of c corresponding to its position in this order.

For example, if PEl executes p,

1 3 4 5

Online LibraryB. D LubachevskyAn approach to automating the verification of compact parallel coordination programs. I → online text (page 1 of 5)