Online Library → Phyllis G Frankl → Data flow testing in the presence of unexecutable paths → online text (page 1 of 2)

Font size

Computer Science Department

TECHNICAL REPORT

DATA FLOW TESTING IN THE PRESENCE OF

UNEXECUTABLE PATHS

By

Phyllis G. Frankl

Elaine J. Weyuker

Technical Report //208

March 1986

^P

61

r V

**'".':;-â– -

,..â– *â€¢

^4^.1'. Jiilil^'t â– â– *â– :.:, -M.-rai'.liO

0) (-(

D* CO

0) fl>

â€¢ 3

o

o

to

o

i-h

C

a>

X

o

c

ft

0)

KT

(-â–

(T (U

0) 3 ,

o o

â‚¬ ^Ti 3

rti< cnl

a> M O)

m M !-([

n H- ,

M- m 1-3

3 wl

(jQ o I

to]

H- of

3 ooi

n-

3"

' ysi ^ f!&^' â– â– '- ?^i- "St ''"â– #>'>'"â– *â– ' ".â– .:^' jCfl

EW YORK UNIVERSITY

HjiiiM

Courant Institute of Mathematical Sciences

251 MERCER STREET, NEW YORK, N.Y. 10012

DATA FLOW TESTING IN THE PRESENCE OF

UNEXECUTABLE PATHS

By

Phyllis G. Frankl

Elaine J. Weyuker

Technical Report //20i

March 1986

DATA FLOW TESTING IN THE PRESENCE OF

UNEXECUTABLE PATHS

Phyllis C . Frankl and Elaine J. Weyuker

Department of Computer Science

Courant Institute of Mathematical Sciences

New York University

251 Mercer Street

New York, New York 10012

Abstract

Most test data adequacy criteria based upon path selection have the unfortunate property that

for some programs with unexecutable paths, no set of test data is adequate. In this paper we define

a new family of adequacy criteria, derived from the data flow testing criteria, which circumvent this

problem by only requiring the test data to exercise those definition-use associations which are

executable. The inclusion relationship among these criteria is explored.

1. Introduction

Several software test data adequacy criteria are based on the idea that one cannot consider a

program to be adequately tested if no test data has caused certain sequences of statements to be

executed. These methods generally associate a subset T of the input domain of a program P, with

the the set ,P of paths through P's flow graph which are executed when the program is run with inputs

from T. The test T, or equivalently the set of paths P, is said to satisfy criterion C for program P

("T is C-adequate for P") if and only if each of the sequences required by C is a subpath of one of

the paths in P.

The most well-known of these criteria are statement testing, branch testing and path testing

which require that the test data cause every node (respectively branch, path) in the program's flow

graph to be executed [HOW78,HUA75]. Unfortunately, statement and branch testing can fail to

expose many common errors and path testing is usually infeasible since programs with loops may

have infinitely many paths [G0075,HOW76]. Several criteria which are based on analysis of the

This research was supported in part by National Science Foundation Grant DCR8501614 and by Office

of Naval Research Contract NO0O14-85-K-0414.

- 2 -

program's control flow and which are stronger than branch testing but weaker than path testing have

been proposed [HOW78,MIL74,WOO80].

Recently, a number of test data adequacy criteria which are based on data flow analysis and

which "bridge the gap between branch testing and path testing have been proposed and studied

[RAP82,RAP85,LAS83,NTA84,CLA85]. Tools based on some of them have been implemented

[FRA85a,FRA85b,GlR85,KOR85]. These criteria are based on the intuition that one should not

feel confident that a variable has been assigned the correct value at some point in the program if no

test data causes the execution of a path from the assignment to a point where the variable's value is

subsequently used.

All of these criteria suifer from the weakness that for programs with unexecutable paths, it may

be impossible for any test set to satisfy the given adequacy criterion. For example, consider a

program having a for loop in which the upper bound is always greater than or equal to the lower

bound. Such a program has unexecutable paths and therefore cannot be adequately tested using the

path testing criterion. Our experience has shown that for many programs, unexecutable paths make

it impossible for any test to satisiy a given data-flow testing criterion [FRA85a,FRA85b].

This is clearly an undesirable situation. One property which one would expect a "good"

adequacy criterion C to have is the applicability property: for every program P there exists some test

which is C-adequate for P [WEY85]. Not only does the applicability property fail for these criteria,

but for each of them it is undecidable for a given program whether test data exists which adequately

tests the program.

In this paper we define a new family of adequacy criteria, derived from the data-flow testing

criteria proposed in [RAP82,RAP85]. Roughly speaking, for each of these criteria, a test is

adequate if and only if it comes "as close as possible" to satisfying the corresponding data flow

testing criterion. These criteria will be defined precisely and the relationships between them will be

explored in section 3. In section 2 we summarize the theory of data-flow testing, extending it to

apply to programs written in Pascal. In section 4 a new data flow testing criterion is introduced and

its properties are examined.

-3 -

2. Data Flow Testing

A family of test data selection criteria, each based on the program's data flow characteristics,

was defined in [RAP82] and [RAP85] for a very simple universal programming language consisting

of assignment statements, conditional and unconditional transfer statements, and input and output

statements. These criteria, which we call data flow testing, or DF testing for short, require that

certain paths from a variable's definition to its subsequent uses be selected. A tool, ASSET, which

performs DF testing on programs written in such a language is described in [FRA85a]. We have

extended DF testing to apply to Pascal programs and enhanced ASSET accordingly. We now

summarize the extended theory of data flow testing.

We apply DF testing to an individual subprogram, i.e., a main program, a procedure, or a

function. While treating each array element as an individual variable would in many cases lead to

the selection of better test data [FRA85b] doing so is not practical. We therefore treat each array as

a single entity. Similarly, occurrences of pointer variables are analyzed purely syntacticly; no anempt

is made to identify the object to which the pointer points. Each field of a record is treated as an

individual variable. Any unqualified occurrence of a record is treated as an occurrence of each field

of the record. As a technical convenience we assume that the subprogram does not have goto

statements, with statements, variant records, functions having variable parameters, or procedure or

function parameters. We also assume that in every conditional statement at least one variable occurs

in the boolean expression which determines the flow of control.

A subprogram can be uniquely decomposed into a set of disjoint blocks of statements. A block

is a maximal sequence of simple statements havmg the properties that it can only be entered through

the first statement and that whenever the first statement is executed, the remaining statements are

executed in the given order. The subprogram to be tested is represented by a flow graph in which

the nodes correspond to the blocks of the subprogram, and edges indicate possible flow of control

between blocks. Figure 1 shows the subgraphs corresponding to statements in the language. The

subprogram's flow graph is obtained by merging the exit node of each statement with the entry node

of the following statement. An entry node preceding the first statement of the procedure and an exit

PlG^i^l^E^ i

(f)) denotes the subgraph corresponding to S,.

â– ssignmcnt stttcment

V := expr.

procedure call

P(jt, ^X

input/output statements

rcadCxp ...xj;

read(f,j:,,.-.,JrJ;

rcadln(X|, ..,xâ€ž);

rcadln(f,ar|,...,xâ€ž);

write(x,,...,Xn);

write(f,J:, ...xj;

writeln(x ,xâ€ž)\

wrÂ»teln(f,^i,.- ,Jfâ€ž);

conditional statements

if boolean expr then S,

if boolean expr then 5,

else S^;

case boolean expr of

/ I -s,-

end;

] ifJode-naTi c-use of e"iclÂ»"^ariable in ex,

followed by a definition of v.

node j has c-uscs of j, xâ€ž followed I

definitions of x, , . . . .x,_ where tl

parameters in positions i Â«,â€ž are variab

parameters and the rest are value parameters

Node i has definitions of Jt, 'Â«â– If '''^

variable f is present node i has a c-use of f

followed by a definition of f; otherwise node i

has a c-use of input followed by a definition of

input.

Node i has a c-usc of x,,

,xâ€ž.

Let k be the entry node of

Edges (i,j) and (i.k) have p-uses of ca

variable in boolean expr.

Let j,k be the entry nodes of and

Edges (i,j) and (i,k) have p-uses of ea

variable in boolean expr.

Let n,

,n be the entry nodes of

Edges (i,n,) {Â»â– ",.) have p-uses of ca

variable in boolean expr.

repetitive â– tatements

while boolean expr do S^â– ,

repeat 5,;

^2:

until boolean expr;

Let k be the entry nod^ of

Eges (i.j) and (i.k) have p-uses of each

variable in boolean expr.

\

\ Let j be the entry node of

; and let k be the exit node of

Eges (k,j) and (k,i) have p-uses of each

variable in boolean expr.

for V := expr J to expr2 do 5,

Let g and h be

respectively of

the

entry and exit nodes

Node i has c-uses of all variables in exprl

followed by a definition of v. Edges (j,g) and

(j.k) have p-uses of v and of all variables in

expr2. Node h has a c-use of v followed by a

definition of v.

HVj^^E i U-^

- 4 -

node succeeding the last statement are added.

Data flow analysis was originally used for compiler optimization, and generally classifies each

variable occurrence as being either a definition, an undefinition or a use [HEC77,SCH73]. In

addition, we distinguish between two substantially different types of uses. The first type directly

affects the computation being performed or allows one to see the result of some earlier definition.

We call such a use a computation use or a c-use. Of course, a c-use may indirectly affect the flow of

control through the subprogram. In contrast, the second type of use directly affects the flow of

control through the subprogram, and thereby may indirectly affect the computations performed. We

call such a use a predicate use or p-use. Figure 1 shows the classification of variable occurrences in

the language's statements.

We are interested in tracing the flow of data between nodes, and thus define a c-use of a

variable x in node i to be a global c-use if the value of x has been assigned in some block other than

block i. Let x be a variable occurring in a subprogram. A path ((,n,,...,nâ€ž,7), m^O, containing no

definitions or uiidefinitions of x m nodes n^,...,n,^ is called a definition clear path with respect to x

[def-clear path wrr x] trorxi code i tc node j and from node i to edge (n,â€ž,7). A node i has a global

definition of a variable x if it ha5 a definition of x and there is a def-clear path wrt x from node i to

some node containing a global c-use or edge containing a p-use of x. Since every p-use is associated

with a potential transfer of control from one node to another, there is no need to distinguish between

p-uses and global p-uses. The subprogram's def-use graph is obtained by associating with each node

i, the sets c-use(i) and def(i) defmed in Figure 2, and with each edge (i,j) the set p-use(i,j). In

addition, the entry node is considered to have a global definition of each parameter, each non-local

variable which occurs in the subprogram and the text variable input which may be implicitly used in

read or readln statements The exit node has an undefinition of each local variable and a c-use of

each variable parameter.

A definition-c-use association is a triple (i,j,x) where i is a node containing a global definition of

X and i â‚¬ dcu(x,i). A defmition-p-use association is a triple (i,(j,k),x) where i is a node containing a

global definition of x and (j,k) â‚¬ dpu(x,i). A simple path is one in which all nodes, except possibly

- 5 -

V

=

the set of variables

N

=

the set of nodes

E

=

the set of edges

def(i)

~

{x â‚¬ V 1 X has a global definition

in block i}

c-use(i)

=

{x â‚¬ V 1 X has a global c-use in block i}

p-use(i,j)

=

{x â‚¬ V 1 X has a p-use in edge (i,j) }

dcu(x,i)

=

{] â‚¬ N 1 X â‚¬ c-use(j) and there

is a def-clear path from i to j}

dpu(x,i)

=

{(j,k) â‚¬ E 1 X â‚¬ p-use(j,k) and there is

a def-clear path from i to fj,k) }

Figure 2

the first and last, are distinct. A loop -free path is one in which all nodes are distinct. A path

(ni,...,n,,n;) is a du-path with respect to a variable x if n , has a global definition of x and either

i) rtj has a c-use of x and (n^,...,n,,n^) is a def-clear simple path with respect to x, or

ii) (i,."*) h^s 2 p-use of X and (n,,. ..,Â«,) is a def-clear loop-free path with respect to x.

An association is a definition-c-use association, a definition-p-use association, or a du-path.

A path IT covers a definition-c-use association (i,j,x) [respectively a defirition-p-use association

(i,(j,k),x)] if it has a definition-clear subpath with respect to x from i to j [respectively, from i to

(j,k)]. IT covers a du-path tt' if it' is a subpath of t:. A set P of paths covers an association if some

element of the set does.

A path through a subprogram's flow graph (which we shall refer to in the sequel as a path

through the subprogram) is a control path if its first node is the entry node and its last node is the

exit node. A path is executable or feasible if there exists some assignment of values to input

variables, non-local variables, and parameters which causes the path to be executed. According to

this definition, the question of whether or not a given path through a subprogram is executable is

independent of the context in which that subprogram is called. However, it may depend on the

effects of any procedures or functions which are called along the path. Note that whether or not a

particular path is executable depends on the actual subprogram, not just on its def-use graph. Since

the sets of paths to which we are applying the criteria arise from the execution of test data we will

always assume that they are sets of executable control paths.

Roughly speaking, the family of data flow testing criteria is based on requiring that the test data

execute definition-clear paths from each node containing a global definition of a variable to specified

nodes containing global c-uses and edges containing p-uses of that variable. For each variable

definition we can demand that " I definition clear paths with respect to that variable from the

[.. -, \ uses

" I of the \c- us

somel I

â– â€¢ [p - us

The criteria are defined precisely in Figure 3.

tses

uses

tses

reachable by some such path be executed.

THE DATA FLOW TESTING CRITERIA

Test T satisfies criterion C for subprogram P if for each node i and each x â‚¬ def(i) the set F oi paths

executed by T covers the following associations;

CRITERION

All-defs

All-p-uses

All-p-uses/some-c-uses

All-c-uses/some-p-uses

All-uses

AU-du-paths

ASSOCIATIONS REQUIRED

Some (i,j,x) s.t. j â‚¬ dcu(x,i)

or some (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

All (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

AH (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

Iii addition, if dpu(x,i) = 4> then some (i,j,x) s.t. j â‚¬ dcu(x,i).

All (i,j,x) s.t. j â‚¬ dcu(x,i).

In addition, if dcu(x,i) = then some (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

All (i,j,x) s.t. j â‚¬ dcu(x,i)

and all (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

All du-paths from i to j with respect to x for each j â‚¬ dcu(x,i)

and all du-paths from i to (j,k) with respect to x for each (j,k) â‚¬ dpu(x,i)

For comparison we also define the criteria all-nodes (respectively all-edges, all-paths) which require

that /'cover every node (respectively every edge, every path) in the flow graph.

Figure 3

Criterion C, includes criterion C; if and only if for every subprogram, any test which satisfies

C, also satisfies C;. Criterion C, strictly includes criterion C, denoted C, => C,, if and only if C,

includes C; and for some subprogram P there is a test which satisfies C, but does not satisfy C,. The

notion of subsumption in [CLA85] is identical to our notion of inclusion.

- 7-

Rapps and Weyuker proved that for the simple language for which DF testing was originally

defined, subject to certain minor syntactic restrictions, the relationship among the criteria is as shown

in Figure 4 [RAP82,RAP85]. Clarke et. al. [CLA85] have shown the relationship of the criteria

defined by Laski and Korel [LAS83] and Ntafos [NTA84] to the DF criteria. We have extended the

theory of DF testing in such a way that these relations are preserved. Doing so required the

inclusion of definitions of all non-local variables in the entry node of the procedure and careful

treatment of implicit uses of the text variable input.

3. The Feasible Data Flow Testing Criteria

Given a subprogram P and a DF criterion C it may be the case that no test data for P exists

which satisfies C. This occurs when none of the paths which cover some association required by C

are executable. In such a case, P cannot be adequately tested according to C. In this section we

introduce a new family of criteria which are derived from the DF C'i'eria and which circumvent this

problem and investigate some of its properties.

We will say that an association is executable if there is some executable path which covers it; it

is unexecutable otherwise. We define subsets fdcu(x,i) C dcu(x,i) and fdpu(x,i) C dpu(x,i)

consisting only of those associations which are executable. For each DF criterion C we define a new

criterion C* by selecting the required associations from fdcu(x,i) and fdpu(x,i) instead of from

dcu(x,i) and dpu(x,i). Precise definitions of these criteria are given in Figure 5. We will refer to the

criteria {(all-du-paths)*, (all-uses)*, (all-p-uses/some-c-uses)*, (all-c-uses/some-p-uses)', (all-p-

uses)*} as feasible dataflow testing criteria, or FDF criteria, for short.

The FDF criteria satisfy the applicability property: For any subprogram P and any FDF criterion

C*, there is some (possibly empty) test T which satisfies C*. However, the question of whether a

particular T satisfies C for subprogram P is undecidable. In going from the family DF to the family

FDF, we have traded the undecidability of the existence question, "is there any test which is C-

adequate for P?" for the undecidability of the recognition problem "is a given test C-adequate for P?"

Observe that for any DF criterion C, C => C*. We now investigate the inclusion relations

ALL-PATHS

ALL-DU-PATHS

ALL-USES

ALL-C-USES/

SOME-P-USES

ALL-PUSES/

SOME-C-USES

ALL-

DEFS

ALL-P-USES

ALL-EDGES

ALL-NODES

Figure 4

8

THE FEASIBLE DATA FLOW TESTING CRITERIA

fdcu(x,i) = {]â– â‚¬ dcu(x,i) | the association (i,j,x) is executable}

fdpu(x,i) = {(j,k) i dpu(x,i) | the association (i,(j,k),x) is executable}

Test T satisfies criterion C for subprogram P if for each node i and each x â‚¬ def(i) the set ^of paths

executed by T covers the following associations:

CRITERION

(all-defs)*

REQUIRED ASSOCIATIONS

if fdcu(x,i) U fdpu(x,i) * ({> then

some (i,j,x) s.t j â‚¬ fdcu(x,i) or some (i,(j,k),x) s.t. (j,k) â‚¬ fdpu(x,i).

(all-p-uses)*

all (i,(j,k),x) s.t. (j,k) â‚¬ fdpu(x,i).

(all-p-uses/some-c-uses)* all {i,(j,k),x) s.t. (j,k) â‚¬ fdpu(x,i).

In addition, if fdpu(x,i) = and fdcu(x,i) t^ (all-du-paths)' and (all-paths)* => (all-edges)* are similar and

will be omitted.

B.2. (all-edges)* => (all-nodes)* :

Let T be a test which satisfies (all-edges)* for subprogram P, and let P be the set of paths

executed by T. Let n be any executable node in P. If n is the entry node, then n has a unique

successor, m, and (n,m) is executable. So " covers (n,m) and hence covers n. If n is not the entry

node, then since n is executable, some branch (i,n) is executable. So P covers (i,n) and hence covers

n.

B.3. (all-uses)* => (all-p-uses/some-c-uses)*, (all-p-uses/some-c-uses)* => (all-p-uses)*, (all-p-

uses/some-c-uses)" => (all-defs)*, (all-uses)* => (all-c-uses/some-p-uses)*, (all-c-uses/some-p-uses)*

=> (all-defs)* :

These inclusions follow immediately from the definitions of the criteria given in Figure 5. For

example, any set P of paths which covers all of the associations required by (all-uses)* will a fortiori

cover all of the associations required by (all-p-uses/some-c-uses)*.

We next show that those relations not in the transitive closure of the diagram in Figure 6 do not

hold.

- 10 -

C.l. (all-du-paths)* =)6- (all-p-uses)'; (all-du-paths)* =/? (all-p-uses/some-c-uses)'; (all-du-paths)' ^

(all-uses)'; (all-du-paths)* z^ (all-c-uses/some-p-uses)'; (all-du-paths)* =3$. (all-defs)*; (all-du-paths)*

si. (all-edges)*; (all-du-paths)* :J> (all-nodes)* :

It suffices to show that (all-du-paths)* =i> (all-p-uses)*, (all-du-paths)* =)6> (all-defs)*, and (all-

du-paths)* ^ (all-nodes)*. The rest follows from the transitivity of =>. Consider the subprogram

shown in Figure 7a. Its du-paths are shown in Figure 7b. Of these, only (1,2), (2,3,4), (4,3,4), and

(4,3,5) are executable Let T = {(X,Y)} where X is any integer and Y < 0. Since T executes P =

{(1,2,3,4,3,4,3,5,6,8)}, T satisfies (all-du-paths)*. However, P does not cover the associations

(2,(5,7),y), (2,7,x), or node 7, all of which are covered by the executable path (1,2,3,4,3,4,3,5,7),

so T does not satisfy (all-p-uses)*, (all-defs)*, or (all-nodes)*.

Intuitively, (all-du-paths)* fails to include these criteria because it is possible for a subprogram

to have certain definition-use associations which can be executed only by paths which traverse some

loop one or more times. In section 4 we shall introduce a modified version of the (all-du-paths)*

criterion which includes (all-uses)' and (all-edges)', and hence all of the other FDF criteria.

C.2 (all-p-uses)' =i> (all-edges)*; (all-p-uses/some-c-uses)' si. (all-edges)'; (all-uses)* ^ (all-

edges)'; (all-p-uses)' =^ (all-nodes)*, (all-p-uses/some-c-uses)* =>S> (all-nodes)*; (all-uses)* =^ (all-

nodes)* :

Consider the subprogram in Figure 8 Notice that since node 3 is unexecutable, y is always

uninitialized at node 4. We will assume that under these circumstances, edges (4,5) and (4,6) are

TECHNICAL REPORT

DATA FLOW TESTING IN THE PRESENCE OF

UNEXECUTABLE PATHS

By

Phyllis G. Frankl

Elaine J. Weyuker

Technical Report //208

March 1986

^P

61

r V

**'".':;-â– -

,..â– *â€¢

^4^.1'. Jiilil^'t â– â– *â– :.:, -M.-rai'.liO

0) (-(

D* CO

0) fl>

â€¢ 3

o

o

to

o

i-h

C

a>

X

o

c

ft

0)

KT

(-â–

(T (U

0) 3 ,

o o

â‚¬ ^Ti 3

rti< cnl

a> M O)

m M !-([

n H- ,

M- m 1-3

3 wl

(jQ o I

to]

H- of

3 ooi

n-

3"

' ysi ^ f!&^' â– â– '- ?^i- "St ''"â– #>'>'"â– *â– ' ".â– .:^' jCfl

EW YORK UNIVERSITY

HjiiiM

Courant Institute of Mathematical Sciences

251 MERCER STREET, NEW YORK, N.Y. 10012

DATA FLOW TESTING IN THE PRESENCE OF

UNEXECUTABLE PATHS

By

Phyllis G. Frankl

Elaine J. Weyuker

Technical Report //20i

March 1986

DATA FLOW TESTING IN THE PRESENCE OF

UNEXECUTABLE PATHS

Phyllis C . Frankl and Elaine J. Weyuker

Department of Computer Science

Courant Institute of Mathematical Sciences

New York University

251 Mercer Street

New York, New York 10012

Abstract

Most test data adequacy criteria based upon path selection have the unfortunate property that

for some programs with unexecutable paths, no set of test data is adequate. In this paper we define

a new family of adequacy criteria, derived from the data flow testing criteria, which circumvent this

problem by only requiring the test data to exercise those definition-use associations which are

executable. The inclusion relationship among these criteria is explored.

1. Introduction

Several software test data adequacy criteria are based on the idea that one cannot consider a

program to be adequately tested if no test data has caused certain sequences of statements to be

executed. These methods generally associate a subset T of the input domain of a program P, with

the the set ,P of paths through P's flow graph which are executed when the program is run with inputs

from T. The test T, or equivalently the set of paths P, is said to satisfy criterion C for program P

("T is C-adequate for P") if and only if each of the sequences required by C is a subpath of one of

the paths in P.

The most well-known of these criteria are statement testing, branch testing and path testing

which require that the test data cause every node (respectively branch, path) in the program's flow

graph to be executed [HOW78,HUA75]. Unfortunately, statement and branch testing can fail to

expose many common errors and path testing is usually infeasible since programs with loops may

have infinitely many paths [G0075,HOW76]. Several criteria which are based on analysis of the

This research was supported in part by National Science Foundation Grant DCR8501614 and by Office

of Naval Research Contract NO0O14-85-K-0414.

- 2 -

program's control flow and which are stronger than branch testing but weaker than path testing have

been proposed [HOW78,MIL74,WOO80].

Recently, a number of test data adequacy criteria which are based on data flow analysis and

which "bridge the gap between branch testing and path testing have been proposed and studied

[RAP82,RAP85,LAS83,NTA84,CLA85]. Tools based on some of them have been implemented

[FRA85a,FRA85b,GlR85,KOR85]. These criteria are based on the intuition that one should not

feel confident that a variable has been assigned the correct value at some point in the program if no

test data causes the execution of a path from the assignment to a point where the variable's value is

subsequently used.

All of these criteria suifer from the weakness that for programs with unexecutable paths, it may

be impossible for any test set to satisfy the given adequacy criterion. For example, consider a

program having a for loop in which the upper bound is always greater than or equal to the lower

bound. Such a program has unexecutable paths and therefore cannot be adequately tested using the

path testing criterion. Our experience has shown that for many programs, unexecutable paths make

it impossible for any test to satisiy a given data-flow testing criterion [FRA85a,FRA85b].

This is clearly an undesirable situation. One property which one would expect a "good"

adequacy criterion C to have is the applicability property: for every program P there exists some test

which is C-adequate for P [WEY85]. Not only does the applicability property fail for these criteria,

but for each of them it is undecidable for a given program whether test data exists which adequately

tests the program.

In this paper we define a new family of adequacy criteria, derived from the data-flow testing

criteria proposed in [RAP82,RAP85]. Roughly speaking, for each of these criteria, a test is

adequate if and only if it comes "as close as possible" to satisfying the corresponding data flow

testing criterion. These criteria will be defined precisely and the relationships between them will be

explored in section 3. In section 2 we summarize the theory of data-flow testing, extending it to

apply to programs written in Pascal. In section 4 a new data flow testing criterion is introduced and

its properties are examined.

-3 -

2. Data Flow Testing

A family of test data selection criteria, each based on the program's data flow characteristics,

was defined in [RAP82] and [RAP85] for a very simple universal programming language consisting

of assignment statements, conditional and unconditional transfer statements, and input and output

statements. These criteria, which we call data flow testing, or DF testing for short, require that

certain paths from a variable's definition to its subsequent uses be selected. A tool, ASSET, which

performs DF testing on programs written in such a language is described in [FRA85a]. We have

extended DF testing to apply to Pascal programs and enhanced ASSET accordingly. We now

summarize the extended theory of data flow testing.

We apply DF testing to an individual subprogram, i.e., a main program, a procedure, or a

function. While treating each array element as an individual variable would in many cases lead to

the selection of better test data [FRA85b] doing so is not practical. We therefore treat each array as

a single entity. Similarly, occurrences of pointer variables are analyzed purely syntacticly; no anempt

is made to identify the object to which the pointer points. Each field of a record is treated as an

individual variable. Any unqualified occurrence of a record is treated as an occurrence of each field

of the record. As a technical convenience we assume that the subprogram does not have goto

statements, with statements, variant records, functions having variable parameters, or procedure or

function parameters. We also assume that in every conditional statement at least one variable occurs

in the boolean expression which determines the flow of control.

A subprogram can be uniquely decomposed into a set of disjoint blocks of statements. A block

is a maximal sequence of simple statements havmg the properties that it can only be entered through

the first statement and that whenever the first statement is executed, the remaining statements are

executed in the given order. The subprogram to be tested is represented by a flow graph in which

the nodes correspond to the blocks of the subprogram, and edges indicate possible flow of control

between blocks. Figure 1 shows the subgraphs corresponding to statements in the language. The

subprogram's flow graph is obtained by merging the exit node of each statement with the entry node

of the following statement. An entry node preceding the first statement of the procedure and an exit

PlG^i^l^E^ i

(f)) denotes the subgraph corresponding to S,.

â– ssignmcnt stttcment

V := expr.

procedure call

P(jt, ^X

input/output statements

rcadCxp ...xj;

read(f,j:,,.-.,JrJ;

rcadln(X|, ..,xâ€ž);

rcadln(f,ar|,...,xâ€ž);

write(x,,...,Xn);

write(f,J:, ...xj;

writeln(x ,xâ€ž)\

wrÂ»teln(f,^i,.- ,Jfâ€ž);

conditional statements

if boolean expr then S,

if boolean expr then 5,

else S^;

case boolean expr of

/ I -s,-

end;

] ifJode-naTi c-use of e"iclÂ»"^ariable in ex,

followed by a definition of v.

node j has c-uscs of j, xâ€ž followed I

definitions of x, , . . . .x,_ where tl

parameters in positions i Â«,â€ž are variab

parameters and the rest are value parameters

Node i has definitions of Jt, 'Â«â– If '''^

variable f is present node i has a c-use of f

followed by a definition of f; otherwise node i

has a c-use of input followed by a definition of

input.

Node i has a c-usc of x,,

,xâ€ž.

Let k be the entry node of

Edges (i,j) and (i.k) have p-uses of ca

variable in boolean expr.

Let j,k be the entry nodes of and

Edges (i,j) and (i,k) have p-uses of ea

variable in boolean expr.

Let n,

,n be the entry nodes of

Edges (i,n,) {Â»â– ",.) have p-uses of ca

variable in boolean expr.

repetitive â– tatements

while boolean expr do S^â– ,

repeat 5,;

^2:

until boolean expr;

Let k be the entry nod^ of

Eges (i.j) and (i.k) have p-uses of each

variable in boolean expr.

\

\ Let j be the entry node of

; and let k be the exit node of

Eges (k,j) and (k,i) have p-uses of each

variable in boolean expr.

for V := expr J to expr2 do 5,

Let g and h be

respectively of

the

entry and exit nodes

Node i has c-uses of all variables in exprl

followed by a definition of v. Edges (j,g) and

(j.k) have p-uses of v and of all variables in

expr2. Node h has a c-use of v followed by a

definition of v.

HVj^^E i U-^

- 4 -

node succeeding the last statement are added.

Data flow analysis was originally used for compiler optimization, and generally classifies each

variable occurrence as being either a definition, an undefinition or a use [HEC77,SCH73]. In

addition, we distinguish between two substantially different types of uses. The first type directly

affects the computation being performed or allows one to see the result of some earlier definition.

We call such a use a computation use or a c-use. Of course, a c-use may indirectly affect the flow of

control through the subprogram. In contrast, the second type of use directly affects the flow of

control through the subprogram, and thereby may indirectly affect the computations performed. We

call such a use a predicate use or p-use. Figure 1 shows the classification of variable occurrences in

the language's statements.

We are interested in tracing the flow of data between nodes, and thus define a c-use of a

variable x in node i to be a global c-use if the value of x has been assigned in some block other than

block i. Let x be a variable occurring in a subprogram. A path ((,n,,...,nâ€ž,7), m^O, containing no

definitions or uiidefinitions of x m nodes n^,...,n,^ is called a definition clear path with respect to x

[def-clear path wrr x] trorxi code i tc node j and from node i to edge (n,â€ž,7). A node i has a global

definition of a variable x if it ha5 a definition of x and there is a def-clear path wrt x from node i to

some node containing a global c-use or edge containing a p-use of x. Since every p-use is associated

with a potential transfer of control from one node to another, there is no need to distinguish between

p-uses and global p-uses. The subprogram's def-use graph is obtained by associating with each node

i, the sets c-use(i) and def(i) defmed in Figure 2, and with each edge (i,j) the set p-use(i,j). In

addition, the entry node is considered to have a global definition of each parameter, each non-local

variable which occurs in the subprogram and the text variable input which may be implicitly used in

read or readln statements The exit node has an undefinition of each local variable and a c-use of

each variable parameter.

A definition-c-use association is a triple (i,j,x) where i is a node containing a global definition of

X and i â‚¬ dcu(x,i). A defmition-p-use association is a triple (i,(j,k),x) where i is a node containing a

global definition of x and (j,k) â‚¬ dpu(x,i). A simple path is one in which all nodes, except possibly

- 5 -

V

=

the set of variables

N

=

the set of nodes

E

=

the set of edges

def(i)

~

{x â‚¬ V 1 X has a global definition

in block i}

c-use(i)

=

{x â‚¬ V 1 X has a global c-use in block i}

p-use(i,j)

=

{x â‚¬ V 1 X has a p-use in edge (i,j) }

dcu(x,i)

=

{] â‚¬ N 1 X â‚¬ c-use(j) and there

is a def-clear path from i to j}

dpu(x,i)

=

{(j,k) â‚¬ E 1 X â‚¬ p-use(j,k) and there is

a def-clear path from i to fj,k) }

Figure 2

the first and last, are distinct. A loop -free path is one in which all nodes are distinct. A path

(ni,...,n,,n;) is a du-path with respect to a variable x if n , has a global definition of x and either

i) rtj has a c-use of x and (n^,...,n,,n^) is a def-clear simple path with respect to x, or

ii) (i,."*) h^s 2 p-use of X and (n,,. ..,Â«,) is a def-clear loop-free path with respect to x.

An association is a definition-c-use association, a definition-p-use association, or a du-path.

A path IT covers a definition-c-use association (i,j,x) [respectively a defirition-p-use association

(i,(j,k),x)] if it has a definition-clear subpath with respect to x from i to j [respectively, from i to

(j,k)]. IT covers a du-path tt' if it' is a subpath of t:. A set P of paths covers an association if some

element of the set does.

A path through a subprogram's flow graph (which we shall refer to in the sequel as a path

through the subprogram) is a control path if its first node is the entry node and its last node is the

exit node. A path is executable or feasible if there exists some assignment of values to input

variables, non-local variables, and parameters which causes the path to be executed. According to

this definition, the question of whether or not a given path through a subprogram is executable is

independent of the context in which that subprogram is called. However, it may depend on the

effects of any procedures or functions which are called along the path. Note that whether or not a

particular path is executable depends on the actual subprogram, not just on its def-use graph. Since

the sets of paths to which we are applying the criteria arise from the execution of test data we will

always assume that they are sets of executable control paths.

Roughly speaking, the family of data flow testing criteria is based on requiring that the test data

execute definition-clear paths from each node containing a global definition of a variable to specified

nodes containing global c-uses and edges containing p-uses of that variable. For each variable

definition we can demand that " I definition clear paths with respect to that variable from the

[.. -, \ uses

" I of the \c- us

somel I

â– â€¢ [p - us

The criteria are defined precisely in Figure 3.

tses

uses

tses

reachable by some such path be executed.

THE DATA FLOW TESTING CRITERIA

Test T satisfies criterion C for subprogram P if for each node i and each x â‚¬ def(i) the set F oi paths

executed by T covers the following associations;

CRITERION

All-defs

All-p-uses

All-p-uses/some-c-uses

All-c-uses/some-p-uses

All-uses

AU-du-paths

ASSOCIATIONS REQUIRED

Some (i,j,x) s.t. j â‚¬ dcu(x,i)

or some (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

All (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

AH (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

Iii addition, if dpu(x,i) = 4> then some (i,j,x) s.t. j â‚¬ dcu(x,i).

All (i,j,x) s.t. j â‚¬ dcu(x,i).

In addition, if dcu(x,i) = then some (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

All (i,j,x) s.t. j â‚¬ dcu(x,i)

and all (i,(j,k),x) s.t. (j,k) â‚¬ dpu(x,i).

All du-paths from i to j with respect to x for each j â‚¬ dcu(x,i)

and all du-paths from i to (j,k) with respect to x for each (j,k) â‚¬ dpu(x,i)

For comparison we also define the criteria all-nodes (respectively all-edges, all-paths) which require

that /'cover every node (respectively every edge, every path) in the flow graph.

Figure 3

Criterion C, includes criterion C; if and only if for every subprogram, any test which satisfies

C, also satisfies C;. Criterion C, strictly includes criterion C, denoted C, => C,, if and only if C,

includes C; and for some subprogram P there is a test which satisfies C, but does not satisfy C,. The

notion of subsumption in [CLA85] is identical to our notion of inclusion.

- 7-

Rapps and Weyuker proved that for the simple language for which DF testing was originally

defined, subject to certain minor syntactic restrictions, the relationship among the criteria is as shown

in Figure 4 [RAP82,RAP85]. Clarke et. al. [CLA85] have shown the relationship of the criteria

defined by Laski and Korel [LAS83] and Ntafos [NTA84] to the DF criteria. We have extended the

theory of DF testing in such a way that these relations are preserved. Doing so required the

inclusion of definitions of all non-local variables in the entry node of the procedure and careful

treatment of implicit uses of the text variable input.

3. The Feasible Data Flow Testing Criteria

Given a subprogram P and a DF criterion C it may be the case that no test data for P exists

which satisfies C. This occurs when none of the paths which cover some association required by C

are executable. In such a case, P cannot be adequately tested according to C. In this section we

introduce a new family of criteria which are derived from the DF C'i'eria and which circumvent this

problem and investigate some of its properties.

We will say that an association is executable if there is some executable path which covers it; it

is unexecutable otherwise. We define subsets fdcu(x,i) C dcu(x,i) and fdpu(x,i) C dpu(x,i)

consisting only of those associations which are executable. For each DF criterion C we define a new

criterion C* by selecting the required associations from fdcu(x,i) and fdpu(x,i) instead of from

dcu(x,i) and dpu(x,i). Precise definitions of these criteria are given in Figure 5. We will refer to the

criteria {(all-du-paths)*, (all-uses)*, (all-p-uses/some-c-uses)*, (all-c-uses/some-p-uses)', (all-p-

uses)*} as feasible dataflow testing criteria, or FDF criteria, for short.

The FDF criteria satisfy the applicability property: For any subprogram P and any FDF criterion

C*, there is some (possibly empty) test T which satisfies C*. However, the question of whether a

particular T satisfies C for subprogram P is undecidable. In going from the family DF to the family

FDF, we have traded the undecidability of the existence question, "is there any test which is C-

adequate for P?" for the undecidability of the recognition problem "is a given test C-adequate for P?"

Observe that for any DF criterion C, C => C*. We now investigate the inclusion relations

ALL-PATHS

ALL-DU-PATHS

ALL-USES

ALL-C-USES/

SOME-P-USES

ALL-PUSES/

SOME-C-USES

ALL-

DEFS

ALL-P-USES

ALL-EDGES

ALL-NODES

Figure 4

8

THE FEASIBLE DATA FLOW TESTING CRITERIA

fdcu(x,i) = {]â– â‚¬ dcu(x,i) | the association (i,j,x) is executable}

fdpu(x,i) = {(j,k) i dpu(x,i) | the association (i,(j,k),x) is executable}

Test T satisfies criterion C for subprogram P if for each node i and each x â‚¬ def(i) the set ^of paths

executed by T covers the following associations:

CRITERION

(all-defs)*

REQUIRED ASSOCIATIONS

if fdcu(x,i) U fdpu(x,i) * ({> then

some (i,j,x) s.t j â‚¬ fdcu(x,i) or some (i,(j,k),x) s.t. (j,k) â‚¬ fdpu(x,i).

(all-p-uses)*

all (i,(j,k),x) s.t. (j,k) â‚¬ fdpu(x,i).

(all-p-uses/some-c-uses)* all {i,(j,k),x) s.t. (j,k) â‚¬ fdpu(x,i).

In addition, if fdpu(x,i) = and fdcu(x,i) t^ (all-du-paths)' and (all-paths)* => (all-edges)* are similar and

will be omitted.

B.2. (all-edges)* => (all-nodes)* :

Let T be a test which satisfies (all-edges)* for subprogram P, and let P be the set of paths

executed by T. Let n be any executable node in P. If n is the entry node, then n has a unique

successor, m, and (n,m) is executable. So " covers (n,m) and hence covers n. If n is not the entry

node, then since n is executable, some branch (i,n) is executable. So P covers (i,n) and hence covers

n.

B.3. (all-uses)* => (all-p-uses/some-c-uses)*, (all-p-uses/some-c-uses)* => (all-p-uses)*, (all-p-

uses/some-c-uses)" => (all-defs)*, (all-uses)* => (all-c-uses/some-p-uses)*, (all-c-uses/some-p-uses)*

=> (all-defs)* :

These inclusions follow immediately from the definitions of the criteria given in Figure 5. For

example, any set P of paths which covers all of the associations required by (all-uses)* will a fortiori

cover all of the associations required by (all-p-uses/some-c-uses)*.

We next show that those relations not in the transitive closure of the diagram in Figure 6 do not

hold.

- 10 -

C.l. (all-du-paths)* =)6- (all-p-uses)'; (all-du-paths)* =/? (all-p-uses/some-c-uses)'; (all-du-paths)' ^

(all-uses)'; (all-du-paths)* z^ (all-c-uses/some-p-uses)'; (all-du-paths)* =3$. (all-defs)*; (all-du-paths)*

si. (all-edges)*; (all-du-paths)* :J> (all-nodes)* :

It suffices to show that (all-du-paths)* =i> (all-p-uses)*, (all-du-paths)* =)6> (all-defs)*, and (all-

du-paths)* ^ (all-nodes)*. The rest follows from the transitivity of =>. Consider the subprogram

shown in Figure 7a. Its du-paths are shown in Figure 7b. Of these, only (1,2), (2,3,4), (4,3,4), and

(4,3,5) are executable Let T = {(X,Y)} where X is any integer and Y < 0. Since T executes P =

{(1,2,3,4,3,4,3,5,6,8)}, T satisfies (all-du-paths)*. However, P does not cover the associations

(2,(5,7),y), (2,7,x), or node 7, all of which are covered by the executable path (1,2,3,4,3,4,3,5,7),

so T does not satisfy (all-p-uses)*, (all-defs)*, or (all-nodes)*.

Intuitively, (all-du-paths)* fails to include these criteria because it is possible for a subprogram

to have certain definition-use associations which can be executed only by paths which traverse some

loop one or more times. In section 4 we shall introduce a modified version of the (all-du-paths)*

criterion which includes (all-uses)' and (all-edges)', and hence all of the other FDF criteria.

C.2 (all-p-uses)' =i> (all-edges)*; (all-p-uses/some-c-uses)' si. (all-edges)'; (all-uses)* ^ (all-

edges)'; (all-p-uses)' =^ (all-nodes)*, (all-p-uses/some-c-uses)* =>S> (all-nodes)*; (all-uses)* =^ (all-

nodes)* :

Consider the subprogram in Figure 8 Notice that since node 3 is unexecutable, y is always

uninitialized at node 4. We will assume that under these circumstances, edges (4,5) and (4,6) are

1 2

Online Library → Phyllis G Frankl → Data flow testing in the presence of unexecutable paths → online text (page 1 of 2)