Copyright
Phyllis G Frankl.

Data flow testing in the presence of unexecutable paths online

. (page 1 of 2)
Online LibraryPhyllis G FranklData flow testing in the presence of unexecutable paths → online text (page 1 of 2)
Font size
QR-code for this ebook


Computer Science Department



TECHNICAL REPORT



DATA FLOW TESTING IN THE PRESENCE OF
UNEXECUTABLE PATHS



By



Phyllis G. Frankl
Elaine J. Weyuker

Technical Report //208
March 1986



^P




61


r V


**'".':;-■-


,..■*•



^4^.1'. Jiilil^'t ■■*■:.:, -M.-rai'.liO



0) (-(

D* CO
0) fl>
• 3

o



o
to



o

i-h

C

a>

X

o

c
ft

0)
KT

(-■



(T (U

0) 3 ,

o o

€ ^Ti 3

rti< cnl
a> M O)
m M !-([
n H- ,
M- m 1-3
3 wl

(jQ o I
to]

H- of

3 ooi



n-

3"



' ysi ^ f!&^' ■■'- ?^i- "St ''"■#>'>'"■*■' ".■.:^' jCfl



EW YORK UNIVERSITY




HjiiiM



Courant Institute of Mathematical Sciences

251 MERCER STREET, NEW YORK, N.Y. 10012



DATA FLOW TESTING IN THE PRESENCE OF
UNEXECUTABLE PATHS

By

Phyllis G. Frankl

Elaine J. Weyuker



Technical Report //20i
March 1986



DATA FLOW TESTING IN THE PRESENCE OF
UNEXECUTABLE PATHS

Phyllis C . Frankl and Elaine J. Weyuker

Department of Computer Science

Courant Institute of Mathematical Sciences

New York University

251 Mercer Street

New York, New York 10012



Abstract

Most test data adequacy criteria based upon path selection have the unfortunate property that
for some programs with unexecutable paths, no set of test data is adequate. In this paper we define
a new family of adequacy criteria, derived from the data flow testing criteria, which circumvent this
problem by only requiring the test data to exercise those definition-use associations which are
executable. The inclusion relationship among these criteria is explored.

1. Introduction

Several software test data adequacy criteria are based on the idea that one cannot consider a
program to be adequately tested if no test data has caused certain sequences of statements to be
executed. These methods generally associate a subset T of the input domain of a program P, with
the the set ,P of paths through P's flow graph which are executed when the program is run with inputs
from T. The test T, or equivalently the set of paths P, is said to satisfy criterion C for program P
("T is C-adequate for P") if and only if each of the sequences required by C is a subpath of one of
the paths in P.

The most well-known of these criteria are statement testing, branch testing and path testing
which require that the test data cause every node (respectively branch, path) in the program's flow
graph to be executed [HOW78,HUA75]. Unfortunately, statement and branch testing can fail to
expose many common errors and path testing is usually infeasible since programs with loops may
have infinitely many paths [G0075,HOW76]. Several criteria which are based on analysis of the



This research was supported in part by National Science Foundation Grant DCR8501614 and by Office
of Naval Research Contract NO0O14-85-K-0414.



- 2 -

program's control flow and which are stronger than branch testing but weaker than path testing have
been proposed [HOW78,MIL74,WOO80].

Recently, a number of test data adequacy criteria which are based on data flow analysis and
which "bridge the gap between branch testing and path testing have been proposed and studied
[RAP82,RAP85,LAS83,NTA84,CLA85]. Tools based on some of them have been implemented
[FRA85a,FRA85b,GlR85,KOR85]. These criteria are based on the intuition that one should not
feel confident that a variable has been assigned the correct value at some point in the program if no
test data causes the execution of a path from the assignment to a point where the variable's value is
subsequently used.

All of these criteria suifer from the weakness that for programs with unexecutable paths, it may
be impossible for any test set to satisfy the given adequacy criterion. For example, consider a
program having a for loop in which the upper bound is always greater than or equal to the lower
bound. Such a program has unexecutable paths and therefore cannot be adequately tested using the
path testing criterion. Our experience has shown that for many programs, unexecutable paths make
it impossible for any test to satisiy a given data-flow testing criterion [FRA85a,FRA85b].

This is clearly an undesirable situation. One property which one would expect a "good"
adequacy criterion C to have is the applicability property: for every program P there exists some test
which is C-adequate for P [WEY85]. Not only does the applicability property fail for these criteria,
but for each of them it is undecidable for a given program whether test data exists which adequately
tests the program.

In this paper we define a new family of adequacy criteria, derived from the data-flow testing
criteria proposed in [RAP82,RAP85]. Roughly speaking, for each of these criteria, a test is
adequate if and only if it comes "as close as possible" to satisfying the corresponding data flow
testing criterion. These criteria will be defined precisely and the relationships between them will be
explored in section 3. In section 2 we summarize the theory of data-flow testing, extending it to
apply to programs written in Pascal. In section 4 a new data flow testing criterion is introduced and
its properties are examined.



-3 -

2. Data Flow Testing

A family of test data selection criteria, each based on the program's data flow characteristics,
was defined in [RAP82] and [RAP85] for a very simple universal programming language consisting
of assignment statements, conditional and unconditional transfer statements, and input and output
statements. These criteria, which we call data flow testing, or DF testing for short, require that
certain paths from a variable's definition to its subsequent uses be selected. A tool, ASSET, which
performs DF testing on programs written in such a language is described in [FRA85a]. We have
extended DF testing to apply to Pascal programs and enhanced ASSET accordingly. We now
summarize the extended theory of data flow testing.

We apply DF testing to an individual subprogram, i.e., a main program, a procedure, or a
function. While treating each array element as an individual variable would in many cases lead to
the selection of better test data [FRA85b] doing so is not practical. We therefore treat each array as
a single entity. Similarly, occurrences of pointer variables are analyzed purely syntacticly; no anempt
is made to identify the object to which the pointer points. Each field of a record is treated as an
individual variable. Any unqualified occurrence of a record is treated as an occurrence of each field
of the record. As a technical convenience we assume that the subprogram does not have goto
statements, with statements, variant records, functions having variable parameters, or procedure or
function parameters. We also assume that in every conditional statement at least one variable occurs
in the boolean expression which determines the flow of control.

A subprogram can be uniquely decomposed into a set of disjoint blocks of statements. A block
is a maximal sequence of simple statements havmg the properties that it can only be entered through
the first statement and that whenever the first statement is executed, the remaining statements are
executed in the given order. The subprogram to be tested is represented by a flow graph in which
the nodes correspond to the blocks of the subprogram, and edges indicate possible flow of control
between blocks. Figure 1 shows the subgraphs corresponding to statements in the language. The
subprogram's flow graph is obtained by merging the exit node of each statement with the entry node
of the following statement. An entry node preceding the first statement of the procedure and an exit



PlG^i^l^E^ i



(f)) denotes the subgraph corresponding to S,.



■ssignmcnt stttcment
V := expr.



procedure call
P(jt, ^X



input/output statements



rcadCxp ...xj;
read(f,j:,,.-.,JrJ;
rcadln(X|, ..,x„);
rcadln(f,ar|,...,x„);



write(x,,...,Xn);
write(f,J:, ...xj;

writeln(x ,x„)\

wr»teln(f,^i,.- ,Jf„);



conditional statements

if boolean expr then S,



if boolean expr then 5,
else S^;



case boolean expr of
/ I -s,-



end;




] ifJode-naTi c-use of e"icl»"^ariable in ex,

followed by a definition of v.



node j has c-uscs of j, x„ followed I

definitions of x, , . . . .x,_ where tl

parameters in positions i «,„ are variab

parameters and the rest are value parameters



Node i has definitions of Jt, '«■ If '''^

variable f is present node i has a c-use of f
followed by a definition of f; otherwise node i
has a c-use of input followed by a definition of
input.



Node i has a c-usc of x,,



,x„.



Let k be the entry node of

Edges (i,j) and (i.k) have p-uses of ca
variable in boolean expr.



Let j,k be the entry nodes of and

Edges (i,j) and (i,k) have p-uses of ea
variable in boolean expr.



Let n,



,n be the entry nodes of



Edges (i,n,) {»■",.) have p-uses of ca

variable in boolean expr.



repetitive ■tatements

while boolean expr do S^■,




repeat 5,;
^2:



until boolean expr;




Let k be the entry nod^ of

Eges (i.j) and (i.k) have p-uses of each
variable in boolean expr.



\



\ Let j be the entry node of



; and let k be the exit node of



Eges (k,j) and (k,i) have p-uses of each
variable in boolean expr.



for V := expr J to expr2 do 5,




Let g and h be
respectively of



the



entry and exit nodes



Node i has c-uses of all variables in exprl
followed by a definition of v. Edges (j,g) and
(j.k) have p-uses of v and of all variables in
expr2. Node h has a c-use of v followed by a
definition of v.



HVj^^E i U-^



- 4 -

node succeeding the last statement are added.

Data flow analysis was originally used for compiler optimization, and generally classifies each
variable occurrence as being either a definition, an undefinition or a use [HEC77,SCH73]. In
addition, we distinguish between two substantially different types of uses. The first type directly
affects the computation being performed or allows one to see the result of some earlier definition.
We call such a use a computation use or a c-use. Of course, a c-use may indirectly affect the flow of
control through the subprogram. In contrast, the second type of use directly affects the flow of
control through the subprogram, and thereby may indirectly affect the computations performed. We
call such a use a predicate use or p-use. Figure 1 shows the classification of variable occurrences in
the language's statements.

We are interested in tracing the flow of data between nodes, and thus define a c-use of a
variable x in node i to be a global c-use if the value of x has been assigned in some block other than
block i. Let x be a variable occurring in a subprogram. A path ((,n,,...,n„,7), m^O, containing no
definitions or uiidefinitions of x m nodes n^,...,n,^ is called a definition clear path with respect to x
[def-clear path wrr x] trorxi code i tc node j and from node i to edge (n,„,7). A node i has a global
definition of a variable x if it ha5 a definition of x and there is a def-clear path wrt x from node i to
some node containing a global c-use or edge containing a p-use of x. Since every p-use is associated
with a potential transfer of control from one node to another, there is no need to distinguish between
p-uses and global p-uses. The subprogram's def-use graph is obtained by associating with each node
i, the sets c-use(i) and def(i) defmed in Figure 2, and with each edge (i,j) the set p-use(i,j). In
addition, the entry node is considered to have a global definition of each parameter, each non-local
variable which occurs in the subprogram and the text variable input which may be implicitly used in
read or readln statements The exit node has an undefinition of each local variable and a c-use of
each variable parameter.

A definition-c-use association is a triple (i,j,x) where i is a node containing a global definition of
X and i € dcu(x,i). A defmition-p-use association is a triple (i,(j,k),x) where i is a node containing a
global definition of x and (j,k) € dpu(x,i). A simple path is one in which all nodes, except possibly



- 5 -



V


=


the set of variables


N


=


the set of nodes


E


=


the set of edges


def(i)


~


{x € V 1 X has a global definition
in block i}


c-use(i)


=


{x € V 1 X has a global c-use in block i}


p-use(i,j)


=


{x € V 1 X has a p-use in edge (i,j) }


dcu(x,i)


=


{] € N 1 X € c-use(j) and there

is a def-clear path from i to j}


dpu(x,i)


=


{(j,k) € E 1 X € p-use(j,k) and there is
a def-clear path from i to fj,k) }



Figure 2

the first and last, are distinct. A loop -free path is one in which all nodes are distinct. A path
(ni,...,n,,n;) is a du-path with respect to a variable x if n , has a global definition of x and either

i) rtj has a c-use of x and (n^,...,n,,n^) is a def-clear simple path with respect to x, or

ii) (i,."*) h^s 2 p-use of X and (n,,. ..,«,) is a def-clear loop-free path with respect to x.

An association is a definition-c-use association, a definition-p-use association, or a du-path.

A path IT covers a definition-c-use association (i,j,x) [respectively a defirition-p-use association
(i,(j,k),x)] if it has a definition-clear subpath with respect to x from i to j [respectively, from i to
(j,k)]. IT covers a du-path tt' if it' is a subpath of t:. A set P of paths covers an association if some
element of the set does.

A path through a subprogram's flow graph (which we shall refer to in the sequel as a path
through the subprogram) is a control path if its first node is the entry node and its last node is the
exit node. A path is executable or feasible if there exists some assignment of values to input
variables, non-local variables, and parameters which causes the path to be executed. According to
this definition, the question of whether or not a given path through a subprogram is executable is
independent of the context in which that subprogram is called. However, it may depend on the
effects of any procedures or functions which are called along the path. Note that whether or not a
particular path is executable depends on the actual subprogram, not just on its def-use graph. Since
the sets of paths to which we are applying the criteria arise from the execution of test data we will



always assume that they are sets of executable control paths.

Roughly speaking, the family of data flow testing criteria is based on requiring that the test data
execute definition-clear paths from each node containing a global definition of a variable to specified
nodes containing global c-uses and edges containing p-uses of that variable. For each variable

definition we can demand that " I definition clear paths with respect to that variable from the

[.. -, \ uses

" I of the \c- us

somel I

■• [p - us

The criteria are defined precisely in Figure 3.



tses
uses
tses



reachable by some such path be executed.



THE DATA FLOW TESTING CRITERIA

Test T satisfies criterion C for subprogram P if for each node i and each x € def(i) the set F oi paths
executed by T covers the following associations;



CRITERION
All-defs

All-p-uses
All-p-uses/some-c-uses

All-c-uses/some-p-uses

All-uses

AU-du-paths



ASSOCIATIONS REQUIRED

Some (i,j,x) s.t. j € dcu(x,i)

or some (i,(j,k),x) s.t. (j,k) € dpu(x,i).

All (i,(j,k),x) s.t. (j,k) € dpu(x,i).

AH (i,(j,k),x) s.t. (j,k) € dpu(x,i).

Iii addition, if dpu(x,i) = 4> then some (i,j,x) s.t. j € dcu(x,i).

All (i,j,x) s.t. j € dcu(x,i).

In addition, if dcu(x,i) = then some (i,(j,k),x) s.t. (j,k) € dpu(x,i).

All (i,j,x) s.t. j € dcu(x,i)

and all (i,(j,k),x) s.t. (j,k) € dpu(x,i).

All du-paths from i to j with respect to x for each j € dcu(x,i)

and all du-paths from i to (j,k) with respect to x for each (j,k) € dpu(x,i)



For comparison we also define the criteria all-nodes (respectively all-edges, all-paths) which require
that /'cover every node (respectively every edge, every path) in the flow graph.



Figure 3

Criterion C, includes criterion C; if and only if for every subprogram, any test which satisfies

C, also satisfies C;. Criterion C, strictly includes criterion C, denoted C, => C,, if and only if C,
includes C; and for some subprogram P there is a test which satisfies C, but does not satisfy C,. The
notion of subsumption in [CLA85] is identical to our notion of inclusion.



- 7-

Rapps and Weyuker proved that for the simple language for which DF testing was originally
defined, subject to certain minor syntactic restrictions, the relationship among the criteria is as shown
in Figure 4 [RAP82,RAP85]. Clarke et. al. [CLA85] have shown the relationship of the criteria
defined by Laski and Korel [LAS83] and Ntafos [NTA84] to the DF criteria. We have extended the
theory of DF testing in such a way that these relations are preserved. Doing so required the
inclusion of definitions of all non-local variables in the entry node of the procedure and careful
treatment of implicit uses of the text variable input.

3. The Feasible Data Flow Testing Criteria

Given a subprogram P and a DF criterion C it may be the case that no test data for P exists
which satisfies C. This occurs when none of the paths which cover some association required by C
are executable. In such a case, P cannot be adequately tested according to C. In this section we
introduce a new family of criteria which are derived from the DF C'i'eria and which circumvent this
problem and investigate some of its properties.

We will say that an association is executable if there is some executable path which covers it; it
is unexecutable otherwise. We define subsets fdcu(x,i) C dcu(x,i) and fdpu(x,i) C dpu(x,i)
consisting only of those associations which are executable. For each DF criterion C we define a new
criterion C* by selecting the required associations from fdcu(x,i) and fdpu(x,i) instead of from
dcu(x,i) and dpu(x,i). Precise definitions of these criteria are given in Figure 5. We will refer to the
criteria {(all-du-paths)*, (all-uses)*, (all-p-uses/some-c-uses)*, (all-c-uses/some-p-uses)', (all-p-
uses)*} as feasible dataflow testing criteria, or FDF criteria, for short.

The FDF criteria satisfy the applicability property: For any subprogram P and any FDF criterion
C*, there is some (possibly empty) test T which satisfies C*. However, the question of whether a
particular T satisfies C for subprogram P is undecidable. In going from the family DF to the family
FDF, we have traded the undecidability of the existence question, "is there any test which is C-
adequate for P?" for the undecidability of the recognition problem "is a given test C-adequate for P?"

Observe that for any DF criterion C, C => C*. We now investigate the inclusion relations



ALL-PATHS



ALL-DU-PATHS



ALL-USES




ALL-C-USES/
SOME-P-USES



ALL-PUSES/
SOME-C-USES





ALL-
DEFS



ALL-P-USES



ALL-EDGES



ALL-NODES



Figure 4



8



THE FEASIBLE DATA FLOW TESTING CRITERIA

fdcu(x,i) = {]■ € dcu(x,i) | the association (i,j,x) is executable}
fdpu(x,i) = {(j,k) i dpu(x,i) | the association (i,(j,k),x) is executable}



Test T satisfies criterion C for subprogram P if for each node i and each x € def(i) the set ^of paths
executed by T covers the following associations:



CRITERION

(all-defs)*



REQUIRED ASSOCIATIONS



if fdcu(x,i) U fdpu(x,i) * ({> then

some (i,j,x) s.t j € fdcu(x,i) or some (i,(j,k),x) s.t. (j,k) € fdpu(x,i).



(all-p-uses)*



all (i,(j,k),x) s.t. (j,k) € fdpu(x,i).



(all-p-uses/some-c-uses)* all {i,(j,k),x) s.t. (j,k) € fdpu(x,i).

In addition, if fdpu(x,i) = and fdcu(x,i) t^ (all-du-paths)' and (all-paths)* => (all-edges)* are similar and
will be omitted.

B.2. (all-edges)* => (all-nodes)* :

Let T be a test which satisfies (all-edges)* for subprogram P, and let P be the set of paths
executed by T. Let n be any executable node in P. If n is the entry node, then n has a unique
successor, m, and (n,m) is executable. So " covers (n,m) and hence covers n. If n is not the entry
node, then since n is executable, some branch (i,n) is executable. So P covers (i,n) and hence covers
n.

B.3. (all-uses)* => (all-p-uses/some-c-uses)*, (all-p-uses/some-c-uses)* => (all-p-uses)*, (all-p-
uses/some-c-uses)" => (all-defs)*, (all-uses)* => (all-c-uses/some-p-uses)*, (all-c-uses/some-p-uses)*
=> (all-defs)* :

These inclusions follow immediately from the definitions of the criteria given in Figure 5. For
example, any set P of paths which covers all of the associations required by (all-uses)* will a fortiori
cover all of the associations required by (all-p-uses/some-c-uses)*.

We next show that those relations not in the transitive closure of the diagram in Figure 6 do not
hold.



- 10 -

C.l. (all-du-paths)* =)6- (all-p-uses)'; (all-du-paths)* =/? (all-p-uses/some-c-uses)'; (all-du-paths)' ^
(all-uses)'; (all-du-paths)* z^ (all-c-uses/some-p-uses)'; (all-du-paths)* =3$. (all-defs)*; (all-du-paths)*
si. (all-edges)*; (all-du-paths)* :J> (all-nodes)* :

It suffices to show that (all-du-paths)* =i> (all-p-uses)*, (all-du-paths)* =)6> (all-defs)*, and (all-
du-paths)* ^ (all-nodes)*. The rest follows from the transitivity of =>. Consider the subprogram
shown in Figure 7a. Its du-paths are shown in Figure 7b. Of these, only (1,2), (2,3,4), (4,3,4), and
(4,3,5) are executable Let T = {(X,Y)} where X is any integer and Y < 0. Since T executes P =
{(1,2,3,4,3,4,3,5,6,8)}, T satisfies (all-du-paths)*. However, P does not cover the associations
(2,(5,7),y), (2,7,x), or node 7, all of which are covered by the executable path (1,2,3,4,3,4,3,5,7),
so T does not satisfy (all-p-uses)*, (all-defs)*, or (all-nodes)*.

Intuitively, (all-du-paths)* fails to include these criteria because it is possible for a subprogram
to have certain definition-use associations which can be executed only by paths which traverse some
loop one or more times. In section 4 we shall introduce a modified version of the (all-du-paths)*
criterion which includes (all-uses)' and (all-edges)', and hence all of the other FDF criteria.

C.2 (all-p-uses)' =i> (all-edges)*; (all-p-uses/some-c-uses)' si. (all-edges)'; (all-uses)* ^ (all-
edges)'; (all-p-uses)' =^ (all-nodes)*, (all-p-uses/some-c-uses)* =>S> (all-nodes)*; (all-uses)* =^ (all-
nodes)* :

Consider the subprogram in Figure 8 Notice that since node 3 is unexecutable, y is always
uninitialized at node 4. We will assume that under these circumstances, edges (4,5) and (4,6) are


1

Online LibraryPhyllis G FranklData flow testing in the presence of unexecutable paths → online text (page 1 of 2)