Online Library → Gad M Landau → Efficient parallel and serial approximate string matching → online text (page 1 of 2)

Font size

Efficient Parallel And Serial Approximate

String Matching

by

Gad M. Landau^- ^

Uzi Vishkin^^ ^

Ultracomputer Note #101

Computer

Science Department Technica

February, 1986

Report #221

Ultracomputer Research Laboratory

I

c

t-l

c w

(0

(U

.-. e

o

i^ o

(tj w

1-1 TD Oj a<

C/l O 4J Â«J CTl

(l< c c

Â£ - 4)

O 3

CJ (0 o

'O

(Q A

o

New York University

Courant Institute of Mathematical Sciences

Division of Computer Science

251 Mercer Street, New York, NY 10012

Efficient Parallel And Serial Approximate String Matching

by

Gad M . Landau^' â– ^

Uzi Vishkin^' ^

Ultracomputer Note #101

Computer Science Department Technical Report #221

February, 1986

^ Department of Computer Science, School of Mathematical Sciences, Tel Aviv University

^ Department of Computer Science, Courant Institute of Mathematical Sciences, New York

University and (present address) Department of Computer Science, School of Mathematical

Sciences, Tel Aviv University. The research of this author was supported by NSF grants

NSF-DCR-8318874 and NSF-DCR-8413359 and ONR grant N00014-85-K-0046.

^ The research of both authors was supported by the Applied Mathematical Sciences

subprogram of the Office of Energy Research, U. S. Department of Energy under contract

numoer DE-AC02-"6ER03077.

ABSTRACT

Consider the string matciiing problem, where differences between characters of the

pattern and characters of the text are allowed. Each difference is due to either a

mismatch between a character of the text and a character of the pattern or a

superfluous character m the text or a superfluous character in the pattern. Given a

text of length n. a pattern of length m and an mteger k, we present parallel and

serial algorithms for finding all occurrences of the pattern m the text with at most k

differences. The first part of the parallel algorithm consists of analysis of the

pattern and takes O(Iogm) time using m" processors. The rest of the algorithm

consists of handling the text. The text handling part applies the following new-

approach. This part starts by obtaining a concise characterization of the text which

is based solely on substrings of the pattern m OUogm) time using n/logm

processors. Then the desired output is derived from this characterization together

with the tables built in the first part in 0(k) time using n processors.

The serial algorithm follows also this new approach for handling the text. It runs m

0(kn) time for alphabet whose size is fixed. For general input the algorithm

requires 0(Â«(/: - logm)) time. In both cases the space requirement is 0(n).

1. Introdsiction

The problem. Input. Two arrays: A = ai,...,aâ€ž - the pattern, T = ti,....tâ€ž - the text

and an integer k (^1).

In the known problem of pattern matching in strings (e.g., as discussed in [KMP-77]) we

are interested in finding all occurrences of the pattern in the text. In the present paper we

are interested in designing an algorithm that finds all such occurrences with at most k

differences.

Example. Let the text be ahcdefghi , the pattern hxdyegh and k = 7). Let us see whether

there is an occurrence with s k differences that ends at the eighth location of the text. For

this we propose the following correspondence between bcdefghi and bxdyegh. 1. b (of the

text) corresponds to ii (of the pattern) .2. c to .v. 3.

String Matching

by

Gad M. Landau^- ^

Uzi Vishkin^^ ^

Ultracomputer Note #101

Computer

Science Department Technica

February, 1986

Report #221

Ultracomputer Research Laboratory

I

c

t-l

c w

(0

(U

.-. e

o

i^ o

(tj w

1-1 TD Oj a<

C/l O 4J Â«J CTl

(l< c c

Â£ - 4)

O 3

CJ (0 o

'O

(Q A

o

New York University

Courant Institute of Mathematical Sciences

Division of Computer Science

251 Mercer Street, New York, NY 10012

Efficient Parallel And Serial Approximate String Matching

by

Gad M . Landau^' â– ^

Uzi Vishkin^' ^

Ultracomputer Note #101

Computer Science Department Technical Report #221

February, 1986

^ Department of Computer Science, School of Mathematical Sciences, Tel Aviv University

^ Department of Computer Science, Courant Institute of Mathematical Sciences, New York

University and (present address) Department of Computer Science, School of Mathematical

Sciences, Tel Aviv University. The research of this author was supported by NSF grants

NSF-DCR-8318874 and NSF-DCR-8413359 and ONR grant N00014-85-K-0046.

^ The research of both authors was supported by the Applied Mathematical Sciences

subprogram of the Office of Energy Research, U. S. Department of Energy under contract

numoer DE-AC02-"6ER03077.

ABSTRACT

Consider the string matciiing problem, where differences between characters of the

pattern and characters of the text are allowed. Each difference is due to either a

mismatch between a character of the text and a character of the pattern or a

superfluous character m the text or a superfluous character in the pattern. Given a

text of length n. a pattern of length m and an mteger k, we present parallel and

serial algorithms for finding all occurrences of the pattern m the text with at most k

differences. The first part of the parallel algorithm consists of analysis of the

pattern and takes O(Iogm) time using m" processors. The rest of the algorithm

consists of handling the text. The text handling part applies the following new-

approach. This part starts by obtaining a concise characterization of the text which

is based solely on substrings of the pattern m OUogm) time using n/logm

processors. Then the desired output is derived from this characterization together

with the tables built in the first part in 0(k) time using n processors.

The serial algorithm follows also this new approach for handling the text. It runs m

0(kn) time for alphabet whose size is fixed. For general input the algorithm

requires 0(Â«(/: - logm)) time. In both cases the space requirement is 0(n).

1. Introdsiction

The problem. Input. Two arrays: A = ai,...,aâ€ž - the pattern, T = ti,....tâ€ž - the text

and an integer k (^1).

In the known problem of pattern matching in strings (e.g., as discussed in [KMP-77]) we

are interested in finding all occurrences of the pattern in the text. In the present paper we

are interested in designing an algorithm that finds all such occurrences with at most k

differences.

Example. Let the text be ahcdefghi , the pattern hxdyegh and k = 7). Let us see whether

there is an occurrence with s k differences that ends at the eighth location of the text. For

this we propose the following correspondence between bcdefghi and bxdyegh. 1. b (of the

text) corresponds to ii (of the pattern) .2. c to .v. 3.

1 2

Online Library → Gad M Landau → Efficient parallel and serial approximate string matching → online text (page 1 of 2)