U Vishkin.

Lucid-boxes vs. black-boxes online

. (page 2 of 3)
Online LibraryU VishkinLucid-boxes vs. black-boxes → online text (page 2 of 3)
Font size
QR-code for this ebook

communication load on many lines, etc..

A natural question to be asked is about polynomial time lucid-box
reducibilities; and, in particular, do they affect the theory of
NP-completeness? Unlike the design of efficient algorithms, where new
opportunities are opened, we show in the remainder of this section that
the answer to this question is essentially negative. The argument, as
we shall see, is simple. It is based on the following observation.
Given input variables and a program that operates on them it is an
arbitrary (semantical) decision which program variables are declared to
be outputs.

Extending the notion of lucid-box reducibility to Turing machines
and the related definitions of NP-completeness is straightforward and
therefore omitted. In the introduction we presented the definition of


[AHU] for NP-completeness. Obviously, every problem which is
NP-coraplete using Cook's reducibility , is NP-complete using [AHU]'s
reducibility. [Ev] implies that it is an open question if the other
direction holds as well. The following sentence relates to the
definition of NP-completeness of [AHU] , given above. Note that both
the program for recognizing Lq and its execution are transparent to us
in a lucid-box reducibility from the problem of recognizing L to the
problem of recognizing Lq . This seems to imply that our definition of
NP-completeness is equivalent to [AHU], since nothing can stop us from
using in any "effective" way the program for recognizing Lq.

We would like to elaborate a little on terminology which is being
used. [GJ] defines a search problem tt to consist of a set D^ of finite
objects called instances and, for each instance, I e D^ , a set S^(I)
of finite objects called solutions for I. An algorithm is said to solve
a search problem it if, given as input any instance I e D^ > it returns
the answer "no" whenever S^(I) is empty and otherwise returns some
solution s belonging to S^(I). Define, alternatively, a proceduFal
solution to ir to consist of an algorithm that solves tt including all
its intermediate computations. Assume that instead of algorithms that
solve search or decision problems we were interested in procedural
solutions of these problems. By definition, any call to a procedural
solution for some problem results in a listing of all its intermediate
computations throughout the T time units during which the procedural
solution ran (for some T). Right after the example in the
introduction, we remarked that a coroutine that runs in time 0(T) can
be simulated by a subroutine that runs in time 0(T ) by restarting it
instead of resuming its operation from the place it was stopped.

Therefore, if in a reduction of the procedural solutions of one
problem to the procedural solutions of another problem we used only
executions of solutions to the second problem we can simulate it by a
polynomial time Turing reduction. However, the (bizzare) possibility
of using the program for the second problem in other ways than
executing it still leaves the question of [Ev] "formally" open.

Remark . It might be interesting to define an "algebra of procedures"
as follows. Its elements (the procedures) will be pairs which consists


of a program and input and program variables (but must not have output
variables). Lucid-box compositions will serve as the operation of this
algebra since they allow for producing a procedure out of existing
ones. This will be analogous to an algebra of algorithms that can be
defined using black-box compositions. We do not elaborate on this idea
here and leave it for future research.

4 . On_ Structured Programming .

A growing number of programs, documentation of programs and
presentations of algorithms in the literature use principles of
structured programming for modularity and clarity of exposition.

A methodologically useful way to present algorithms in this spirit
is to start with a high-level description of the algorithm which
includes a milestones in their performance. In a hierarchical fashion
this overview is filled with details, until a complete low-level
accurate specification is obtained. For instance, take the
biconnectivity algorithm using depth-first-search (DFS) in [AHU] .
After presenting the DFS method for searching a graph (to which we
refer as a high-level description) this book presents again the DFS now
intermixed with the bookkeeping required for the biconnectivity
algorithm. An alternative approach that we would like to point out
looks at the DFS as a navigator that tells us where to go next, while
there is a bookkeeper that should be told where we are, in order to be
able to do the necessary bookkeeping. We suggest the following
description. There is a 'general manager' that coordinates between the
navigator ('president of the company') and the bookkeeper. The general
manager asks the president where to go, and then transmits the response
to the bookkeeper. The bookkeeper writes down whatever is required and
reports to the general manager when he is done. Then the general
manager consults the president again, and so on.

The modularity of the presentation is increased since now the
high-level description is an independent piece of software. Besides
the important advantage of clarity we would like to point out the
following possible advantage: In many cases a first version of an
algorithm is improved later by polishing its low-level implementation
(bookkeeping) only. It is desirable in such cases to utilize the


prevlous fault-free high-level specifications. If it is written as an
Independent unit it can be used as is.

[K] mentions the methodological importance of understanding
several coroutines as (symmetric) equal partners, unlike a master-slave
relation between a program and a subroutine it calls. The above
hierarchical description may well fit this understanding by naming a
new general manager who employs all coroutines involved. So no
coroutine dominates another.

Exa mples

Example I. Choice of a model of parallel computation.

The paper [V2] deals with the problem of choosing a theoretical
abstract model of parallel computation to be simulated by synchronous
distributed machines. The principle of choosing the most permissible
model of parallel computation as long as the cost of computational
resources does not increase is applied.

We sketch the main ideas in this paper emphasizing the ones that
relate to lucid-box compositions.

Our machine is assumed to be represented by a model of synchronous
distributed computations (SDC). It employs a sequence of RAM's (see
[AHU]) PpP2,...,Pj that operate synchronously in parallel. Each
processor can communicate directly with no more than c other processors
(where c is some "small" constant) through communication lines.
Communication registers which are associated with the lines are used
for the communication.

The concurrent-read exclusive-write parallel RAM (CREW PRAM) is a
synchronous model of parallel computation in which all p processors
Pi,...,P have access to a shared memory. Simultaneous access to the
same common memory location is allowed for read (but not write)
purposes. The Fetch-and-Add (F&A) PRAM model of synchronous parallel
computation allows every operation which is permitted by the CREW PRAM.
In addition, the following is allowed. Let A be a common memory
address and let ei be some address in the local memory of processor
P, . We define the F&A instruction as follows. If processor P,
performs F&A(A,e^) and no other processor performs at the same time an
instruction that relates to address A then the content of A is


transraitted Co processor P. and stored In one of its local registers
and address A is assigned with A+e, . Suppose that several processors
perform simultaneously F&A instructions that relate to A. The result is
defined to be as if these instructions were performed serially in some

Generally speaking the main result of the paper is that every
simulation of algorithms given in the CREW PRAM into the SDC has a
counterpart simulation for algorithms given in the F&A PRAM into the
SDC which requires the same SDC-time and SDC-local memories up to a
constant factor. This supports the choice of the F&A PRAM as an
abstract model of parallel computation as was done for the
NYU-Ultracomputer [GGKMRS].

We sketch the main ideas that relate to lucid-box compositions.
Let us take a pulse of the F&A PRAM. Assume, that in this pulse
processors i^ ,i2 , . . . ,ii. want to execute F&A(Ai,ei), F&ACAo .eo) , . . • ,
F&A(Ai^,e. ), respectively, where A. is a common memory address and e. is
some address in the local memory of processor i. , for I < j < k. All
other processors of the F&A PRAM are assumed to remain idle during the
present pulse. Finally, we arrived at the point where the present
example relates to lucid-box compositions. We 'replace' the given
pulse by the following: every F&A(A-,e-) instruction for processor ij
is replaced by the insruction "read common address A- into local
address e^ of processor i-:", for 1 < j < k. We apply the simulation of
the CREW PRAM into the SDC in order to simulate this 'reading' pulse.
This simulation is done in auxiliary memory locations of the SDC and
all cases where the contents of a (memory or communication) location of
the SDC is copied into another are recorded in the local memory of the
processor that does it. Let A- (resp. e- be a memory location of the
SDC which simulates memory location A. (resp. e-). We assume that
there is exactly one such A. (resp. ^-i)* "^^^ correctness of the
simulation implies that the contents of location A. is propagated from
A. to ez between the time the simulation of the present reading pulse
begins until the time it ends. Say that it takes T cycles of the SDC
machine. We assume that this propagation occurs by repeatedly copying
the contents of A^ from one SDC memory location to another until it is
finally copied into e-. (No splitting of the bits of this contents or


encodlng of contents of several locations into one are allowed.) This
is called the copying assumption .

Let A be the set of all memory locations (in all local memories)
of the SDC. The following (layered) directed graph G(V,E) is
introduced as a tool for specifying the synchronization of the
remaining steps; edges between two successive layers represent
simultaneous events that take place between two successive ticks of our
'clock'. The set of vertices V of G is V = {(a,t) | a e A, < t < T} .
Layer t, L^, , of G is L^ = {(a,t) | a e A} for t, < t < T. Let
E^ = {[(a,t-l), (b,t)J I A (SDC) processor P^^ , for some 1 < i < s,

copied the contents of address a into
address b at time unit t, 1 < t < t}
E2 = {((a,t-l), (a,t)J I No processor wrote into address a at

time unit t, 1 < t < t}
The set of edges E is the union of E-^ and E2 • Note, that all edges of
E, were actually recorded at the local memories of corresponding

It is not difficult to observe that binary trees with A- as roots
and e- as leaves are subgraphs of G. The simulation of the F&A
instructions proceeds by identifying these trees and implementing a
synchronous partial-sums computation on them in order to satisfy the
F&A instructions. The following facts follow readily: the time
required for the simulation is 0(T); and, if processors Pi.P2»'**^d
employed local memories of sizes ra, ,m2,...,raj , respectively, for the
simulation of the reading pulse then they employ local memories of size
0(m^+T), 0(m2+T), ..., 0(m^+T) , respectively, for the simulation of the
pulse that involved the F&A instructions.

The refined corespondence between many performance parameters of
an existing procedure and corresponding parameters of a new procedure
seems to be relevant for many computational environments : For instance,
an algorithm for a synchronous parallel shared memory model of
computation may be evaluated by its number of steps, size of each of
its local and common memories, number of accesses to the shared memory
and their frequency and many more parameters. Another example is an
algorithm for synchronous distributed machines. They may be evaluated


by communication load on each line, sizes of local memories and number
of steps.

The composition of the simulation of the pulse that involves F&A
instructions using execution of the simulation of the reading pulse
gave us a many parameter correspondence between the two simulations.
The notion of lucid-box compositions allows for a closer adherence to
the existing procedure and enables us to use Information which one
cannot expect to find in typical predeclared outputs. Therefore, we
claim that lucid-box compositions are tailored for multi-parameter
optimization of algorithms .

Example II. A Parallel-Design Distributed-Implementation (PDDI)
General-Purpose Computer.

The paper [VI] introduces a scheme of an efficient general-purpose
parallel computer. Its design space (i.e. the model for which
parallel programs are written), is the F&A PRAM. The implementation
space is presented as a scheme of an SDC in which each processor may
communicate with < 4 others. We sketch below how lucid-box
compositions may help in the construction of an efficient translation
of the design space into the implementation space.

The F&A PRAM employs p processors l,2,...,p and m common memory
addresses l,2,...,m.

The SDC consists of a sorting network followed by a merging
network as in Figure 1. Any sorting and merging network can be used .
The SDC employs d (< p) strong processors (called super-processors);
each of them is responsible to simulate the behavior of about p/d F&A
PRAM processors. It also employs 'comparator processors' that behave
similarly to comparator modules of comparison networks and m 'memory
processors', each of them is responsible for simulating the behavior of
one common memory location. We present the solution only for a pulse
of the F&A PRAM of the following form. Assume that processors
i^,i2,. . . ,i(^ want to execute F&A(Aj,ej), F&A(A2,e2), ..., F&A(A,^,e^^)
where A^ is a common memory address and e^ is an address of the local
memory of processor i^ , 1 < j < k, respectively. All other processors
of the F&A PRAM remain idle during the present pulse.


The translation Into the SDC proceeds in cycles. At the j-th

cycle super-processor i simulates the behavior of processor i+(j-l)d, 1

< 1< d, l


Online LibraryU VishkinLucid-boxes vs. black-boxes → online text (page 2 of 3)