Computer Science Department

TECHNICAL REPORT

?^v,

');
}
elseif (getClientWidth() > 430)
{
document.write('');
}
else
{
document.write('');
}
//-->

^>*ia>Rl^

^'V'l?:.

More Efficient Bottom-Up

Multi-Pattern Matching in Trees

j-f

o

4-1

C

C i (0 o

c

o

4-)

e

c

(C

Q-,

I •

-^^ w

Jj QJ

r-l t] .

Hoffmann and O'Donnell use an equivalent recursive definition of match sets (but restricted

to subjects without variable occurrences) to obtain an efficient bottom-up algorithm. The recursive

rules shown below add a new rule for MS{v) to Hoffmann and O'Donnell's rules so that match sets

can be defined for arbitrary patterns.

MS(v)= {v}

MS(c) = ( V } , when constant c i PF

{ V, c } , when constant c e PF

(1) MS(f(t,, .... t,)) = [f{q^ q,,) e PF I q, e MSit,),i = 1 k] u (v)

Surprisingly, this new rule is merely a formalism, since it gives rise to the exact same collection of

match sets as derived by Hoffmann and O'Donnell. This is true, because the match set MS(p) for

an arbitrary pattern p is identical to the match set MS{t) for any pattern t formed from p by replac-

ing occurrences of v in /? by occurrences of arbitrary constants that do not belong to PF.

After determining match sets for constants and variable occurrences in subject t, Hoffmann

and O'Donnell's algorithm solves the Bottom-Up Subproblem by identifying the match set for

each subpattem /(r 1 r*) of f based on the match sets for r,, i = \, ..., k. This task, which we call

the Bottom-Up Step, computes expression (1) by an Oik) time lookup in a ^-dimensional array stor-

ing transition mapT/, where X/{A/5 (f i ), ...,M5(fi)) = A/5(/(fi t^ )).

For consistency, throughout this paper we consider an instance of the Multi-Pattern Matching

Problem with paaem set P, pattern forest PF, and subject t. We also use the following parameters:

n = size of t

r = the set of match sets for P

l=\PF\

o= \MPTMp(t)\

f^max = maximum arity of any function symbol appearing in PF

In order to compute Step (1) and print the set MS(f(t\, .... t/^)) nP of patterns that match

f(t\, ..., tk) in time Oik + \ MS if it \ t^)) nP \), Hoffmann and O'Donnell preprocess the pat-

terns in P to

i. encode each pattern in PF as a distinct integer from 1 to /, and represent patterns as

trees in the obvious way (implemented in compressed form as dags);

ii. compute all match sets, and encode each such set as a distinct integer from 1 to I Fl ;

iii. compute the subset of patterns in P belonging to the /''" match set for / = 1, ..., I Fl ;

iv. compute a transition map Xy for every ^-ary function symbol / occurring in P so that

Xf(MSiti) MSitt)) = MSifUi, .... ti^)); x^ = [v], and x,. = {v, c) if c is any constant

appearing in PF, transition maps Xy are implemented as /: -dimensional arrays accessed

using integer encodings of match sets.

After preprocessing the patterns in P, Hoffmann and O'Donnell's algorithm solves the

Multi-Pattern Matching Problem by repeatedly solving Step (1) from innermost to outermost sub-

pattern of t. Their worst case time is (9(n -t- o) after preprocessing P. The array storing the transi-

tion map Xy for each *-ary function symbol / appearing in PF uses n(IFI*) space, where the

-7-

number in of match sets can be t^(2'), which is expensive in practice. Their rough bound on

preprocessing time is