B. D Lubachevsky.

Process-level, time driven, simulation of a computer network on a parallel shared-memory processor online

. (page 1 of 3)
Online LibraryB. D LubachevskyProcess-level, time driven, simulation of a computer network on a parallel shared-memory processor → online text (page 1 of 3)
Font size
QR-code for this ebook


J



Process-Level, Time-Driven, Simulation of a Computer Network
on a Parallel Shared-Memory Processor

B.D. Lubachevsky andK.G. Ramakrishn^

Couraot Institute of Mathematical Sciences
New York University

251 Mercer Street
New York, NY 10012

Ultracomputer Note #88
October, 1983

ABSTRACT

This paper reports on a parallel, time-driven simulation of a computer
network designed to run on a shared-memory MIMD multiprocessor.
The NYU Ultracomputer is chosen as the model of a shared-memory
MIMD machine. Various results on efficiency and speedup are
reported. It may be inferred from the results presented in the report,
that a time-driven simulation, though inefficient in the uniprocessor
context, can prove to be a useful tool in the parallel context. A major
part of this paper is devoted to algorithms for coordination and
synchronization of processors in the ultracomputer. A dynamic task
manager for scheduling tasks is also discussed. Actual FORTRAN
codes are presented for the synchronization and scheduling of
processors. A brief overview of a new table-driven approach for
simulating computer networks at process-level, is also presented. This
method has the potential for enormous savings in development time of
simulation programs. After a discussion of results on speedup and
efficiency of our parallel simulator, a simple theoretical model is
developed to predict the efficiency for larger computer networks.
This method takes into account the synchronization overhead and
concurrency of events.

1. Introduction

Although serial event driven (ED) simulation has proven to be both a
general and a practical method [3, 9], its parallel counterpart has not yet
yielded efficient tools for large enough class of problems. The serial time
driven (TD) simulation is indisputably poorer than serial ED simulation. The
imminent availability of multiprocessors, resurrects, we believe, both a



'This work was supported by DOE grant number AC0276ER03077.



theoretical and practical interest in TD simulation of a large class of physical
systems that possess an inherent synchronism. However, not too many
reports on the parallel TD simulation has been recently seen.

In this paper, we develop a simulator of a computer network consisting
of tens of heterogeneous processors servicing hundreds of on-line
inquiry/response terminals. There are two essential assumptions about the
network which make its parallel TD simulation feasible: (I) inherent
synchronism of the whole network, (11) high density of concurrent events
throughout the network. Item (I) is induced by the time slice mechanism
which is common to all processors in the network and tightly coupled
communication interface. The process level of our simulation, made possible
by our table-driven mechanism, induces item (11). Being a paradigm of a
class of computer networks, the analysis of this example, we believe, wiU
help develop practical simulation tools.

The parallel host computer we chose for the simulator is a shared
memory asynchronous processor, enhanced with the basic primitive
Fetch&Add. The MIMD shared memory architecture presents the following
main advantages in comparison to message passing architecture: efficiency of
the solutions for a large class of problems (not only for those networks, for
example, whose geometry agree with the geometry of the message passing
machine), more flexible program structure allowing dynamic network
reallocation during the execution, ease of handling global statistics, and better
manageability of programming.

Since the general availability of the parallel hardware is imminent [5,7,
12], a simulator of a promising multiprocessor design, the Ultracomputer,
was chosen. This simulator truthfully emulates the environment of the
Ultracomputer at the level of machine instructions. The simulator also
enables one to write parallel programs in FORTRAN. Hence, the process-
level, time-driven simulator was written in FORTRAN and various
performance measures of the program were predicted by running the
program on the simulator of the Ultracomputer. We believe these measures
to qualitatively reflect the performance of our parallel program on the
Ultracomputer.

In this "almost" real environment of the simulation a "potpourri" of
various computer science and programming "ingredients" was unavoidable,
each of which might possibly deserve a separate paper. In fact we discuss a
model of parallel processor in sec. 2, give an overview of the discrete event
simulation methodology in sec. 3, describe in detail a particular computer
system, as well as the table-driven approach to simulating computer networks
at process level in sec. 4. We also cannot avoid detailed discussion of various
data structures in sec. 4 and 5 and present parallel programming codes. We
analyze the experimental results of the parallel simulation in sec. 7 and
comment on a way of relaxing assumption (11) in sec. 8.



Page 2



To avoid getting bogged down in the quagmire of our potpourri, we
hope that the following description might be of help. There are three layers
in our simulation (Fig. 1). The computer network, called target is simulated
by a parallel shared memory MIMD processor, called host, whose simulator
in turn runs on a serial processor, called host-simulator. We hope it is clearly
vmderstood that this three-layers simulation is not an efficient way to simulate
the original target network.

2. The model of the parallel processor

We first describe the ideal model, dubbed a "paracomputer" [12]. The
paracomputer consists of A^ identical PEs sharing a common memory. The
individual PEs may also have attached local memory, which we refer to as
their "private" memories; the memory shared by and common to all
processors is called "shared memory," and variables stored there are called
"shared variables."

This original definition [12] implies a very large number A^ (thousands
and more). We will consider moderate values of N as well. We also relax
the original requirement that the PEs be identical. In our model, though the
PEs contain the same instruction set, they may have different execution
speeds. In fact, none of the software constructs described below makes
expUcit use of the fact that the instruction repertoire is identical for all PEs.

Paracomputer presumes no contention for accessing shared memory. In
the case of identical PEs, this assumption may be also expressed as follows.
The PEs can simultaneously read any shared cell in one cycle. Moreover,
simultaneous writes, including the Fetch&Add operations described below,
are likewise effected in a single cycle. A memory cell to which such writes
are directed wiU contain some one of the quantities written into it. Note that
simultaneous memory updates are not serialized; in fact they are accomplished
in one cycle.

Paracomputers must be regarded as idealized computational models since
physical fan-in limitations prevent their realization. One realizable
approximation to a paracomputer is an "Ultracomputer" [5] in which each PE
can directly access its private memory and can access the shared memory via
a (multicycle) interconnection network. In this more realistic architecture a
shared memory access may require several PE cycles. For moderate values
of A'' simpler solutions of the problem of connecting the PEs with the shared
memory are feasible, such as the crossbar switch.

The code of a parallel program looks like the one for an ordinary serial
program. Each PE "sees" the shared memory and executes the program
code. In this paper, the only case considered is one in which the code is
identical for all the PEs. Fetch&Add is the primitive to coordinate PEs.

The format of the F&A operation is F&A (V,e), where V is a variable
and e is an expression. This indivisible operation yields the old contents of V

Page 3



as its value and replaces the contents of V by V+ e .

The following principle of serialization of operations F&A takes place:
Assume that several (possibly very many) F


1 3

Online LibraryB. D LubachevskyProcess-level, time driven, simulation of a computer network on a parallel shared-memory processor → online text (page 1 of 3)