Copyright
Gad M Landau.

Efficient parallel and serial approximate string matching online

. (page 1 of 2)
Online LibraryGad M LandauEfficient parallel and serial approximate string matching → online text (page 1 of 2)
Font size
QR-code for this ebook


Efficient Parallel And Serial Approximate


String Matching




by






Gad M. Landau^- ^






Uzi Vishkin^^ ^






Ultracomputer Note #101




Computer


Science Department Technica
February, 1986


Report #221




Ultracomputer Research Laboratory



I



c

t-l

c w

(0
(U

.-. e



o



i^ o
(tj w
1-1 TD Oj a<

C/l O 4J «J CTl

(l< c c

£ - 4)

O 3
CJ (0 o

'O



(Q A

o



New York University

Courant Institute of Mathematical Sciences

Division of Computer Science

251 Mercer Street, New York, NY 10012



Efficient Parallel And Serial Approximate String Matching

by

Gad M . Landau^' ■^

Uzi Vishkin^' ^

Ultracomputer Note #101

Computer Science Department Technical Report #221

February, 1986



^ Department of Computer Science, School of Mathematical Sciences, Tel Aviv University

^ Department of Computer Science, Courant Institute of Mathematical Sciences, New York
University and (present address) Department of Computer Science, School of Mathematical
Sciences, Tel Aviv University. The research of this author was supported by NSF grants
NSF-DCR-8318874 and NSF-DCR-8413359 and ONR grant N00014-85-K-0046.

^ The research of both authors was supported by the Applied Mathematical Sciences
subprogram of the Office of Energy Research, U. S. Department of Energy under contract
numoer DE-AC02-"6ER03077.



ABSTRACT

Consider the string matciiing problem, where differences between characters of the
pattern and characters of the text are allowed. Each difference is due to either a
mismatch between a character of the text and a character of the pattern or a
superfluous character m the text or a superfluous character in the pattern. Given a
text of length n. a pattern of length m and an mteger k, we present parallel and
serial algorithms for finding all occurrences of the pattern m the text with at most k
differences. The first part of the parallel algorithm consists of analysis of the
pattern and takes O(Iogm) time using m" processors. The rest of the algorithm
consists of handling the text. The text handling part applies the following new-
approach. This part starts by obtaining a concise characterization of the text which
is based solely on substrings of the pattern m OUogm) time using n/logm
processors. Then the desired output is derived from this characterization together
with the tables built in the first part in 0(k) time using n processors.
The serial algorithm follows also this new approach for handling the text. It runs m
0(kn) time for alphabet whose size is fixed. For general input the algorithm
requires 0(«(/: - logm)) time. In both cases the space requirement is 0(n).

1. Introdsiction

The problem. Input. Two arrays: A = ai,...,a„ - the pattern, T = ti,....t„ - the text
and an integer k (^1).

In the known problem of pattern matching in strings (e.g., as discussed in [KMP-77]) we
are interested in finding all occurrences of the pattern in the text. In the present paper we
are interested in designing an algorithm that finds all such occurrences with at most k
differences.

Example. Let the text be ahcdefghi , the pattern hxdyegh and k = 7). Let us see whether
there is an occurrence with s k differences that ends at the eighth location of the text. For
this we propose the following correspondence between bcdefghi and bxdyegh. 1. b (of the
text) corresponds to ii (of the pattern) .2. c to .v. 3.


1

Online LibraryGad M LandauEfficient parallel and serial approximate string matching → online text (page 1 of 2)