Copyright
Jacob T Schwartz.

A taxonomic table of parallel computers, based on 55 design online

. (page 1 of 1)
Online LibraryJacob T SchwartzA taxonomic table of parallel computers, based on 55 design → online text (page 1 of 1)
Font size
QR-code for this ebook


A Taxonomic Table of Parallel Computers, Based on 55 Designs

J . Schwartz

Courant Institute, N.Y.U.

251 Mercer Street

New York, N.Y. 10012

Ultracomputer Note #69

November 1983

1. Coarse grained Designs

1.1. Subclassifications

A. Procedurally Oriented, Omega Network or Cube Design

A.]. Packet Switching
A. 2. Circuit Switching

B. Dataflow

C . Tree Structured Machines

D. Nearest Neighbor Machines

E. Crossbar Designs

F. Ring Structured Machines

G. Bus Structured Machines

H . Miscellaneous and Eclectic Designs

1.2. Detailed Descriptions

A. Procedurally Oriented, Omega Network or Cube Design
A.l . Packet Switching

(1) Gottlieb/Grishman/Kruskal... (NYU) Ultracomputer 1983
Combining Network, Fetch-and-add coordination

(2) Kuck/GajskiA.awrie... (U.Ill., Urbana) CEDAR 1982

Local clusters of 8 processors with crossbar interconnect treated as
smallest assignable execution unit.

(3) Lindstrom/Bames (Burroughs Corp) FMP

Packet communication handled by short circuit-switch phases;
supplemented global-or net; no combining of requests. In other respects



resembles NYU Ultracomputer.

(4) Rettberg/Kraley (BBN Corp.) Butterfly 1979

MC68000-based MIMD system with memory and processor association.
Network supports block transfers and interrupt requests. Seems to be
logically circuit rather than packet switched and involve no buffering or
synchronization on switches. Memory not interleaved, so ultracomputer
type coordination is impossible. Fetch-and-add provided at software
level.

(5) Smith (Denelcor) HEP 1980

Powerful individual processors timesliced to match memory latency.

(6) Seitz/Locanthe/Fox... (Caltech) Homogeneous Machine 1972
Hypercube interconnect explicitly visible to progrzunmer; message
transmission along single edge is basic interprocessor communication
step.

(7) Sullivan/Baskow (Sullivan Associates) CHOPP 1978

Hypercube interconnect used in earlier version, may go to omega net
(proprietary).

(8) Briggs/Fu/Hwang... (Rice U. and Purdue) PUMPS 1982

Vaguely fleshed-out proposal for parallel processor with supplementary
special-purpose chips, including some for image processing.

A. 2. Circuit Switching

(1) Browne/LipovskiTTripathi... (U.Texas, Austin) TRAC 1980

Circuit switching design with some packet switching capabilities. The
circuit switching is used to achieve "reconfigurability", i.e., assignment
of memory and processing power to tasks, or even fine grained
computational resources such as byte-wide adders, which can be linked
together with longer adders.

B. Dataflow

(1) Dennis/Misunas (MIT) Static Dataflow 1975-80

(2) Arvind (MIT) Tagged Token Dataflow 1980

Bus-structured system proposed in initizd Irvine variant. As compared to
basic Dennis ideas, additional "tags" allow dynamic loop unrolling while
maintaining proper dependency relationships between operations.
Attempts to keep computation within a single processing element.
Hardware support for "I structures" (aggregates).

(3) Davis/Dongrawski (U.Utah) DDMl 1975

Tree-structured machine dependency on locality of virtual dataflow
processing; more layout to attziin efficiency.

(4) Hogenauer/New bold/Inn (TRW Corp) DD DDSP 1982
32-processor configuration organized into "subgroups" of 2, "clusters"
of 8 on several levels of buses. Executes binary operations in dataflow

Page 2



fashion using common associative "matching store" to dispatch
completed sets of operands. Supports up to 2000 virtual dataflow nodes.

(5) Sowa/Murata (U. Illinois, Chicago) Dataflow

"Associative multiported memory" used to achieve ready-packet
dispatching. (Note: This suggestion is not usuable for large dataflow
systems.)

(6) Sauber/Cornish (Texas Instruments) TI Dataflow Machine 1980
Never seems to have got past testbed status.

(7) GurdAVatson/Glauer (Univ. Manchester) Manchester Dataflow 1981
Ring-structured dataflow with 20 processors.

(8) KishiA'asuhara/Kawamura (OKI) DDMP

Supports loop unrolling similar to Tagged-Token architecture.

(9) Takahashi/Amamiya (NTT) Dataflow Processing Array
Dataflow processors arranged in 2D grid.

C. Tree Structured Machines

(1) Stolfo (Columbia U.) DADO

MIMD/SIMD tree machine, in which any binary subtree able to execute
in SIMD mode.

(2) Goodman/Despain (U.C. Berkeley) XTREE 1978

Tree machine with additional perfect shuffle connections between the
leaves.

(3) Seguin/Goodman (U.C. Berkeley) Hypertree

Tree machine with additional cube connections between nodes of each
level.

(4) Keller/Lindstrom/Patil (U.Utah) AMPS 1979

Tree with processing elements at leaves only and intermediate nodes
specialized for communication.

(5) Song (CMU) Tree Machine 1980

Tree machine with memory storage at nodes, retrieval requests are
broadcast in one tree of processors and results are combined in an
inverted tree sharing the same nodes.

(6) Shin/Lee/Sasidar (RPI) HM^P 1982

Tree-like machine based on "clusters" which share a common bus, and
shared on-bus memory; these "clusters" are then organized into a tree.

(7) Mago (U. North Carolina)

Reduction-oriented tree machine. Binary tree machine with additional
interconnects between siblings.

D. Nearest Neighbor Machines

(1) Slotnick (U. Illinois) ILLIAC IV 1970

The classic nearest neighbor rectangular array.

Page 3



(2) HalsteadAVaxd (MIT) MUNET 19980

Nearest neighbor array, not necessarily rectangular, with memories
intermediate between processors.

(3) Jordan/Storalsi/Pratt (NASA-Langley) Finite Element Machine
Nearest neighbors including diagonal interconnect with supplementary
busses; circuit-switching reconfigurability concept in which finite-
element mesh is mapped directly to configured processor array.

(4) Kung/Aran/Gal/Ezer... (U.S.C.) Wavefront Array Processor 1982
Nearest neighbor SIMD array with systolic usage concept.

(5) Hoshino/Kawai/Shirakawa (Tsukuba U.) PACS 1983
Ordinary, 32 processor nearest neighbor array.

(6) Brooks/Fox/Gupta... (Caltech) Cosmic Cube вАФ NNCP 1981
4x4 nearest neighbor MIMD array, based on 8086/67.

E. Crossbar Designs

(1) de Witt (U. Wisconsin) Database Machine

Low degree of parallelism, intended for database searches.

(2) (LLL) S-1 1978

High-speed individual processors, low degree of parallelism.

(3) Buehrer/Brandietz/Benz... (E.T.H., Zurich) EMPRESS 1982
17-processor crossbar design, with one processor specializing as
"supervisor."

(4) Villemin (Comp. Sci. Dept., CNAM, Paris) SERFRE 1982

Vaguely described proposal involving hierachy of crossbars, to realize
multi-descendant tree with crossbar communication between groups of
siblings.

(5) Trujillo (LASL) Multimicroprocessor 1981

20 microprocessors communicating on a crossbar switch.

F. Ring Structured Machines

(1) Minker/Rieger/Bare... (U. Maryland) ZMOB 1980

256 microprocessors in ring configuration; messages reviewed by
interrupt; send to specific processor, to all, etc. supported. Parallel
PROLOG application being developed.

G. Bus Structured Machines

(1) Taylor (ELXSI Corp.) ELXSI 1981

Up to 16 4-MIP ECL processors organized around very high-speed bus.

(2) Cordonnier/Mossu (Univ. Lille) MAP 1981
16 microprocessor shared bus configuration.

(3) Davidson (U. Ill) AMP-1 1980

Small shared bus multiprocessor system.



Page 4



(4) Manner (Univ. Heidelberg) Polyp 1982
Bus structured multimicroprocessor system.

(5) Dimopoulous (Concordia Univ., Canada) Homogeneous Multiprocessor
1983

A multiple-bus multimicroprocessor configuration (a small, apparently
paper, proposal).

(6) Guzman (U. of Mexico) Parallel Hetrarchical Machine 1980

Bus structured multi-LISP machine configuration, intended for execution
of parallel LISP.

Many designs of this general class have begun to appear lately, eg. FLEX32 a
2D lattice of VME buses.

H . Miscellaneous and Eclectic Designs

(1) Siegel/KemmererAVashbura (Purdue) PASM 1980

A "partitionable" SIMD shuffle-based machine (other high performance
networks are also being considered), in which the communication net can
be decomposed into portions of size 2**m operating under control of
many standard processors. Note that remaining shuffle connections they
provide for communication between supervisors. Global "or" among
controllers also provided.

(2) Arden/Ginosar (Princeton) MP/C

Processors and memories arranged linearly, with switches that allow the
line to be broken arbitrarily into subsegments. Only one processor
active in each subset at a given time.

(3) Bronson/Siegel (Purdue U.) Parallel Speech Processor 1982
Proposal for specialized collection of parallel machine arranged in series;
the "acoustic processor" substage is to consist of 512 cube-connected
MC68000S.

(4) Postel (Intermetrics Corp.) Hybrid Dataflow System 1982

Nearest neighbor interconnect with superimposed tree, and all nodes at a
given level circularly interconnected.

(5) MapesAVeaver/Logan (LBL, Berkeley) MIDAS 1983
Circuit-switched attachment of memory modules to processors with
several separate "clusters" a common memory is also available to all the
processors within a cluster. About 10 processors form a cluster; the
clusters are then organized into a tree.

(6) Wah/Ma (Purdue) MANIP 1980

Vaguely fleshed-out proposal intended for parallel branch-and-bound
applications. Crossbar interconnected within clusters, linear cyclic
connection between clusters.

(7) Treleaven/Mole (Newcastle) Multiprocessor Reduction Machine

A linear array of processors sharing memory used for processing

Pages



reduction languages.

2. Finegrained Designs

2.1. Subclassifications

A. Bitwise cube and shuffle

B. Nearest neighbor bitwise processing

C. Tree Machines

D. Circuit Switching Reconfigurable

E. Systolic special purpose chips

1.1. Detailed Descriptions

A. Bitwise Cube and Shuffle

(1) Hilles (Thinking Machines Corp.) Connection Machine 1983
Single-bit processors, 16/chip; hypercube interconnect with 50-bit
message packets passed in overlap manner using all cube edges
simultaneously.

(2) Sussman (MIT) Connection Machine 1981
Earlier version of Hilles machine.

(3) Wagner (Duke U.) Boolean Vector Machine 1982

1-bit processors each storing a vector of m bits, with bitwise operations
and single-bit shuffle-neighbor transport operations. (Note: less
specialized hardware than connection machine may penalize most useful
macro operations especially message passing, by requiring generalized
treatment; will be better on, say, the SIMD operations.) Very small
memory (128 bits/PE) proposed.

B. Nearest Neighbor Bitwise Processing

(1) Batcher (Goodyear Aerospace) STARAN 1972

32 X 4096 bit-array of 1-bit processors, SIMD machine; array accessible
either by row or by column.

(2) Surprise (Goodyear Aerospace) ASPRO 1981

Design intermediate between STARAN and MPP; 2048 single-bit
processors, memory array 2048x4096, accessible either by rows or
columns.

(3) Batcher (NASA-Goddard) MPP 1980

128x128 bitwise processor SIMD array with global-or capability.

(4) (ICL Corp.) DAP 1979

64x64 bitwise processor SIMD airray with global-or capability.



Page 6



C. Tree Machines

(1) Mead/Browning (Caltech) Tree Machine

4 bit wide, 512 nibble memory processor VLSI tree proposal.

(2) Shaw (Columbia U.) NONVON

Tree structured from finegrained processors but with shuffle connections
and more substantial processors near the root and with disk controllers at
intermediate levels; SIMD, except that each disk controller can operate
autonomously, for SIMD/MIMD effect.

D. Circuit Switching Reconfigurable

(1) Snyder (Purdue) Blue Chip 1981

Nearest neighbor with diagonal interconnect between "switching" and
"processing" nodes; wafer-scale integration; can put arbitrarily many
switching rows and columns between processors.

E. Systolic special purpose chips

(1) Kung... (CMU) Systolic processors 1979

An entire family of special purpose systolic chips for various numerical,
signal processing, pattern matching, associative, and buffering functions.

3. Micro-Overlapped Serial Processor Designs

(1) Fisher (Yale) ELI

Multiple fast functional units, dispatchable up to 16 instructions at a
time. Reliance is on 500 bit long horizontal microinstructions and
automatic compilation of effective horizontal microcode.

(2) Kuck/Stokes (Burroughs Corp.) BSP 1975

A commercial multi-functional unit, horizontally microcoded design.

(3) (CDC Corp.) AFP 1980

A commercial multi-functional unit, horizontally microcoded design.

(4) Regna/McGraw (LLL) Piecewise Dataflow 1983

48 bit-wide horizontal microcode with vector, several scalar, and
fetch/store unit. Hardware FIFOs used to control dispatching of ready
instructions.



Page?





1

Online LibraryJacob T SchwartzA taxonomic table of parallel computers, based on 55 design → online text (page 1 of 1)