John J Donovan.

A hierarchical approach to information system design online

. (page 1 of 3)
Online LibraryJohn J DonovanA hierarchical approach to information system design → online text (page 1 of 3)
Font size
QR-code for this ebook


LIBRARY

OF THE

MASSACHUSETTS INSTITUTE
OF TECHNOLOGY



n



ewey



HD28
,M414



iV


^


[W


n


e


11


^35feii



:'n3o. iivSl. ICCi!.





Center for Information Systems Research

Massachusetts Institute of Technology

Alfred P. Sloan School of Management

50 Memorial Drive

Cambridge. Massachusetts, 02139







MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Alfred P. Sloan School of Management

Center for Information Systems Research

'A HIERARCHICAL APPROACH
TO INFORMATION SYSTEM DESIGN

by
John Donovan and Henry Jacoby*

REPORT CISR-5

SLOAN WP 762-75

January 21, 1975



* We would like to acknowledge the contributions throught this book of
Grant Smith and Lou Gutentag, Smith most especially for his conceptual
contributions to the DML and DDL, and Gutentag for his application of
this work to NEEMIS.



.M4(4
ho.l(b2-75



1

-J



A HIERARCHICAL APPROACH TO INFORMATION SYSTEM DESIGN

by
John J. Donovan and Henry D. Jacoby



OUTLINE
ABSTRACT
PART I

1 . Structure of Paper

2. Characterization of Data Problems

3. Characteristics of Management Information Systems Problenis

4. Hierarchical Approach
PART II

1. Level 1: Machine Instruction

2. Level 2: The Operating System

3. Level 3: File Systems (Table Facility)

4. Level 4: Relational Operator Level

4.1 Relational History

4.2 Relational Model

4.3 Definition of Operators

4.3.1 Set Operators as Applied to Relations

4.3.2 Relational Operators
4.3.2.1 Example Using Operators

4.3.3 Implementation Operators

4.3.4 New Operators



07.2323'>



A HIERARCHICAL APPROACH TO INFORMATION SYSTEM DESIGN

BY
John Donovan and Henry Jacoby*

ABSTRACT

This paper presents an approach to the development of management
information systems that is particularly applicable to systems with the
following characteristics:

- several classes of users, each of which has a different
degree of sophistication

- complex and changing security requirements

- data exhibits complex and changing inter-relationships

- needs of the information system changing

- must be built quickly and inexpensively

- complex data validation requirements

The approach is hierarchical in that the functional tasks of the system

are grouped and ordered such that each group depends only on functions of

the group beneath it. Each group is called a level. We maintain that not

facil itate
only does this approach / the implementation of management information

systems to fulfill the above needs but provides a sound theoretical basis

for investigating properties of completeness, integrity, correctness, and

performance. Some of these theoretical approaches are also presented. We

practical way of
also maintain that such hierarchical approaches, in general, provide a /

decomposing complex system designs into manageable implementation schemes.

We feel that some of the primitives of the levels described here will

eventually be placed into the hardware architecture of machines. We have

applied this approach to the development of information systems for public

policy decisions regarding energy for the states. New England Energy

Management Information System (NEEMIS).



5. Level 5: Security Validation and Performance

6. Level 6: DDL/DML

6.1 An Example of Using DDL

6.2 Syntax Specification of DDL

6.3 Implementation of the DDL

6.4 The Data Manipulation Language (DML)

6.5 Example of Use of DML in Accessing NEEMIS

6.6 Complete Syntax Specification of DML

6.7 Implementation of DML

7. Performance

8. Further Issues and Investigations in Hierarchies

9. Further Investigations in Relations

10. Comparison of Other Views of Data

1 1 . Summary



-2-
PART 1

1 . Structure of Paper

The paper is divided into two parts. The first exposes the key
problems in representing data in a computer system and gives an overview of
the hierarchical approach. The second gives detailed discussion of each
level of a hierarchy for information systems.

The levels we present are: bare machine, operating systems, store
and retrieval of relations, relational operators, security and validation,
a Data Definition Language (DDL) and a Data Manipulation Language (DML),
report generator and user interface, modeling facility. We formalize some
levels, give the present state of research at MIT of each level, and present
future theorems and potential fruitful research directions of each level.

The store and retrieval relations are further divided into multiple sub-
levels.

2. Characterization of Data Problems

The basis of an information system is data. Let us address ourselves
to the problems of data complexity and complications. Figure 1 depicts an
example of two data series typical of those that economists are
accustomed to handling. One could perform all sorts of statistical
operations on the inventory series - regressions, averages, standard
deviations, etc.

However, for policymaking it is important to store more complex
information, such as the relations between data. For example. Figure 2
depicts four data items: the names of terminals, their address, region
number, and inventory value of different types of fuel. Let us further



complicate Figure 2 by representing data in relation to the owners, sup-
pliers, and all terminals in region 8. Figure 3 depicts these data items
and their relationships. Now visualize what such a diagram would look
like for all terminals in the U.S. and all possible relationships -
a mess! The basic problems are:

- How can such information be represented logically?

- How should an implementor view such data?

- What operators should exist to manipulate such data?

- What mechanisms should be avail c^ble to validate it?

- How can its protection be ensured?

- All of the above within the constraints of:

- good performance (low operating cost of system)

- recognition of the fact that all the relation-
ships might change; the types of data series
available might change



EXXON LYNN TER



LYNN MASS 8 50,000 20,004



GULF NO. 58



/ n \ «

IPSWICH MASS 8 24,300



SHELL BELMONT TER



/ M \„

BELMONT MASS 4 34



SUN OIL LYNN TER



LYNN MASS 8 33,964



#3



Figure 2
Terminal Data



NORTHEAST



LITTLE



LEX



3738



EXXON LYNN TER



LYNN MASS 8 \ 50,000 20,004



GULF NO. 58



~7 — I \ \ «

IPSWICH MASS 8 34,200



REGION 8





JACOBY



""? r

BELMONT 6609



EXXON



SHELL BELMONT TER



/ I \\

BELMONT MASS 4 34



#1




i SUN OIL LYNN TER



/ I \ \

LYNN MASS 8 33,964



#3



Figure 3
Complex Data



-7-



3. Characteristics of Management Information Systems Problems

What is so complicated about an information system? The following
lists some of those problems.

- Representation of data

- Storage

- Retrieval

- Manipulation of Data

- Use as Base for Models for Public Policy

- Satisfy different degree of sophistication of users

All of the above problems msut be addressed within the constraints of:

- low cost

- good performance

- users will want changes as the evolution of the user and
needs of a system change

- levels of users

4. Hierarchical Approach

The basic idea behind this approach is to divide a problem into groups
of functions (levels). Each group is ordered such that it depends only on
the group below it. Let us take a simple example to develop an intuitive
feel for this approach.

Suppose a carpenter wishes to build a house. One view he may have of
his basic components would be as is depicted in Figure 4. That is, he
views all his basic components as wood, nails, putty, glass, etc.



-8-



A hierarchical approach is depicted in Figure 5 where the carpenter
views his components in levels. The second level windows are composed of
components only from the levels below. Doors may be composed of windows
and any of the components of the inner level. This carpenter has simpli-
fied the construction task and also may have simplified the "debugging"
task. Namely, if a door is not operating correctly and if the levels below
it are debugged, then the fault must be in the door. (Note: recursion
is not allowed. Windows cannot have door assemblers in them).

Data Hierarchical Model



The motivation here is to choose a logical representation of data
that is divorced from the physical implementation.
Historical :

File blocking is an example of an early technique for giving the
user of a computer a different view of data than what appears in the physical
implementation. (An example of blocking is having several logical records
placed into one physical record on a tape.)

The paramount advantage of giving a user a logical representation
that is independent and separate from the physical implementation is that
the physical representation may be changed without affecting the applica-
tion prograimer. This separation is an example of a two-level hierarchy.

of
One of the first explanations/this hierarchical concept and its application



to file systems appears in [Madnick, 1970], in SYSTEMS PROGRAMMING
[Donovan, 1972] and later in OPERATING SYSTEMS [Madnick & Donovan, 1974].
The most notable early expositions of this hierarchical concept appears in
"T.H.E. Multiprogramming System", [Dijkstra, 1968].

Extending this two-level heirarchy we find the hierarchy depicted in
Figure 6 is applicable to the design, implementation, and study of information
systems.

A user viewing the bare machine sees instructions like "load, add,

store, multiply". A user viewing level 2 sees instructions like "Get more

memory", "Give me a device". A user at level 3 may store and retrieve

tables. A user at the operator level has three operators on tables. For

example, an operator would be "Find all common entries between tables".

Auser at the data security level can only access tables under prescribed

rules. A user at the DML and DDL level can use a cryptic English for

query information stored in tables or use the DDL to define new tables.

A user at level 8 may activate packages. A user at level 9 may create and

activate models.

Note that each level may be viewed as the user of the level below

it, as it uses the primitives of levels below it. For example, a request

at level 8 to "Select all terminals in Lynn, Mass. of over 50,000 gallons

of ^^ fuel would activate the data security level to check the access rights

of such a request. "ihc data security level would uso the oi^r^tor: cf

level 1 on secu-irty tabic:, ■.•.•i.ici, cor.tair. i:-.for!.:a- ion as i,o -lie proLccLicn

terminal table to
rights on the terminal table. These same operators will be used on the /



■10-







y^



/^



Figure 4
Non-hierarchical View from Carpenter





y^



Figure 5
Hierarchical View



-n-



get the desired information. The request is carried further down the
hierarchy until the actual machine instructions and I/O cormiands of the
bare machine are initiated to satisfy the request.



-12-




Figure 6
Hierarchical Implementation



•13-



PART II



The basic idea behind a hierarchical implementation is that each level
consists of algorithms that depend only on (call) algorithms in levels be-
neath it. No level is allowed to call itself, therefore, recursion on a
level is not allowed.

1. Level 1: Machine Instruction .

Level 1 presents the user with machine instruction.

2. Level 2: The Operating System.

A user of this level sees instructions such as "Give me a device",
"Give me more memory". The operating system is a resource manager in that
it manages the resources of the computer, memory, CPU time, and devices.
We further divide the operating system into sublevels [OPERATING SYSTEMS,
Madnick & Donovan, 1974]. For this paper we will assume that either an
operating system exists or the reader will refer to the references if he
should have to build one.

3. Level 3: File Systems (Table Fac il i ty )

The function of this level is to store and retrieve by symbolic name

tables. A user at this level sees instructions such as CREATE TABLE, READ

standard
TABLE, WRITE TABLE. Some / file systems, for example, IMS, MULTICS,

VSAM, etc., have such a facility (simply by equating a table to a file).

To build one from start, we advocate that this level be subdivided into

the sublevels of Figure 7.

Let us take an example and briefly describe the functions of each

sublevel.

READ TABLE TERMINAL



-14-



Request

i

Symbolic File System module (SFS)

i



Basic File System module (BFS)



Access Control Verification module (ACV)



Logical File System module (LFS)
[Access methods, file structure]



I



Physical File System module (PFS)
[File organization strategy]

^ (if write) ^^



Allocation Strategy module



Device Strategy module



Device
Management



Initiate I/O



^ Operating
System



Device Handler



Figure 7
Hierarchical Model of a File System



■15-



The Symbolic File System's function is to map the symbolic reference

into a unqiue identifier. This sublevel may read a master directory

to find the unique id of 'TERMINAL'. If recursion were allowed, the SFS

(i.e., find the unique id of the master directory)
level would call itself to READ master directory/. Since recursion is not

allowed, SFS cannot call itself to find the unique id of the master direc-
tory. Thus the master directory's unique id must be known by the SFS, and
to read this directory, the level below is called passing the unique id.

We call the process of identifiying what each level must know to pre-
vent recursion "unwinding the recursion".

When the Basic File System is called, it is given the unique ID of the
table requested. The function of the BFS is to find all the information
about the table (e.g., its size, location, access, etc.).

The function of access control verification is to provide the mechanism
for allowing sharing of tables by multiple users with different access
rights.

After access is checked, then the Logical File System is called. The
LFS is concerned with mapping the structure of the logical records onto the

linear byte string view of a file.

The primary function of the Physical File System is to perform the

mapping of the Logical Byte Address into Physical Block Addresses. For
example, the table to be read might be physically scattered in several
different parts of a disk. The LFS might calculate its logical address as
logical bytes 4000 to 5000 where the PFS would compute the specific track
and cylinder numbers for each component.

The Allocation Strategy Module is only activated on a write (or create)
request when more space is needed for a table or a new table is created.
This module finds the space.



• 16-



The Device Strategy Module, I/O initiator, and device handler are
actually levels of the operating system. They create channel programs,
request I/O, schedule the I/O, and handle interrupts.
4. Level 4: Relational Operator Level

This hierarchical approach for the design of an information system
does not require that all of the levels below a certain point be imple-
mented in a hierarchical manner, though we believe that even for the
low levels just discussed this approach is best.

At level 4 we have implemented the thirteen operators on tables
(some times called relations). The operators are: cartesian product,
union, intersection, projection, diadic restriction, monadic restriction,
join, composition, permutation, compuation, difference, inversion, and
ordering.

^.1 Relational History

Relational representation can be looked upon as variants of Post-
canonical systems [Post: 1943], Church's logical systems [Church: 1941],
or somewhat more recently, Smullyan's elementary formal systems [Smullyan:
1961], and most recently, Donovan's canonic systems [Donovan: 1967].

Codd, however, is recognized as the first to apply this sort uf
logical system to the representation of data [Codd: 1971]. Another infor-
mation algebra was proposed by R. Bosak [Bosak: 1962].

We know of six attempts to implement on a computer data in relational
form: ISG [Smith: 1973], MACAIMS [Goldstein, Strnad: 1971], SEQUEL
[Chamberlin: 1974], COLARD [Bracchi: 1972], RIL [Fehher:1972], and M.I.T.'s



-17-



RDMS. Most of these implementations are nearly functionally equivalent to
our DDL and DML of the next level.

The only practical application of a relational system (that we know of)
is to an energy information system for aiding public policy decisions in
New England [Donovan & Jacoby: 1974]. We have further reported in that
paper an extension of these concepts to include protection and validation
mechanisms.

Sloan's contributions to date lie in the area of the first application,
ex-tensions of concepts to handle protection validation, additional opera-
tors, hierarchical implementation, performance considerations, and a Data
Manipulation and Data Definition Language.

4.2 Relational Model

A user at this level views data as relations. A relation in its simplest
form is a tabel where each column is a doma i n and each row is an entry. Let
us formalize this concept and define some operators on relations.

The cartesian product of two sets, A and B, is denoted by A x B, and

is defined by:

A X B = i (a,b): a e A , b e B



by:



The expanded cartesian produ ct, X, of n sets B^ , B2,..., B^ is defined
S(B^,B2,...,B^) = I (b^,b2,...,b^): b^ e B. for j = l,2,...,nl .



The elements of such a set are called n-tuples , or just tuples . When
n = 1, X(Bi) = B, since no distinction is made between a 1-tuple and its
only component.



-18-



Suppose b = (b^, b^.-.-b^) and c = (c^ , C2,..., c^). The concatenation
of b with c is the (m + n)- tuple defined by

b||c = (b^, b2,...b^, c^, C2,...e^).

R is a relation on the sets (B^ Bp,..., B ) if it is a subset of
X(Bi, B^.-.-.B^). A relation is accordingly a special kind of set. Its
members are all n-tuples where n is a constant called the degree of the
relation. Relations of degree 1 are called unary , degree 2 binary , degree 3

ternary , degree n n-ary . The sets on which a relation is defined are called
its underlying domains . For data base purposes, we (unlike the earlier
work of Codd and others) are concerned with data consisting of many types,
e.g., integers, characters, floating point, pointers, binary, boolean, etc.

Note the elements (n-tuples) of a relation have no implied order,
thus insertion or deletion operators are simplified.

4.3 Definition of Operators

If data is represented in this relational model, what are the appro-
priate operators?



-19-
An operators create a new relation either from a single relation or
from two relations. Using the mnemonic "diadic" to refer to operators
that operate on two relations and "monadic" for opei^ators that operate on
one, we may divide the logical operators into four categories:



Category


Operator


Symbol


Type 1


Traditional set operators
as applied to relations


Union

Intersection
Cartesian Product


U

N
X


Diadic
Diadic
Diadic


Relational operators

appearing in literature


Projection

Join

Composition

Permutation

Restriction


P

■A-

M
R


Monadic

Diadic

Diadic

Monadic

Both


Implementation operators


Inversion
Ordering


I


Monadic
Monadic


New operators defined in
our system


Compaction
Difference


c


Monadic
Diadic



Notation: R. is the name of the ith relation

c- = cardinality (# members) of the ith relation

n. = degree of ith relation

d.. = jth domain of R.,; j=l,...n.

v^(d..) = mth value of d.. ; m=l,...c^.

t. = n. -tuple in relation R.
11 1

i.e.: t. . (vjd,,), v^(d,2),...v„(d.„.))



L(a) = length of 1 ist

V(x = "for all values of" a



a e {1 , . . .c. }



= null set (i.e.: R^ = =-> c^ = 0}

a ?, 3 "= a is a subset of 6

a C 6 = a is a prope r subset of 3 (i .c. : a ^ B)



-20-



The following examples will be used throughout to explain definitions:



Rl =



R2



R3



R4



RIO =



(name


soc_set_#

, 213-97-1666

621-49-2990

, 413-00-0029

, 839-41-6942


phone


deptJ)


(SMITH
(MACAVOY
(JACOBY
(SMITH


, 232-1500 ,

356-5175
, 484-7352 ,
, 253-0410 ,




15)

15)

6)

6)


(name


, soc_sec_^


, phone


deptj)


(MADNICK

(SMITH

(MACAVOY


, 217-61-7232

, 213-07-1666

621-49-2990


, 253-6571
, 232-1500
, 356-5175


*
»

i


15)

15)
15)


(person


, age


city)




(MADNICK
(MACAVOY
(SMITH


31
34
23


PEABODY)
IPSWICH)
BOSTON)

, city , street




(name


, phone)




(SMITH
(MACAVOY


. 232-1500)
. 356-5175)




(person


age


#)


(MADNICK
(MACAVOY


31

34


PEABODY , 1
IPSWICH , 4


8)
3)





■21



Example: R6 = Rl N R2; results in:

R6 = (SMITH , 213-07-1666 , 232-1500 , 15)
(MACAVOY, 621-49 2990 , 356-5175 , 15)



Cartesian Product. Suppose you wanted to form a new relation from
two relations where each element of the new relation consisted of every
possible parsing of the elements of the existing relations. The car-
tesian product 'X' would perform this task.



R. = R. X R, (j = k is valid),

I J K



Note: if n. > 1 or n. > 1 , then each t. (or t.) msut be treated

J K J K

as a single domain so that effectively n. = n. = 1.

J K



1 .e.



n.
1



n. + n^



c = c . * c.
1 J k

•^i " ^^\^^jl^' ^B^^kl^^' a = l,...c,^ ; (f. = ],...c.}

R. = {ordered pairs with first member from R., and
second member from R. } ^



Example: R5 = R4 x R4 ; results in:



R5 =



((SMITH ,

((SMITH ,

((mCAVOY ,

((MACAVOY ,



232-1500), (SMITH
232-1 500), (MACAVOY ,
356-5175), (SMITH
356-5175), (MACAVOY ,



232-1500))
356-5175))
232-1500))
356-5175))



■22-



4.3.1 Set Operators as Applied to Relations

Union: Suppose you would like to create a new relation that consisted
of all the elements of two other relations without redundant entries. The
union of two relations would perform this task. Using the symbol U we may
formally define union as

R

c



R. U R. ; (j = k is valid),



~ ^j ^k ~ (R.NR.) i.e., duplicates deleted automatically
J K

= max{n., u^)

= {t. : t. e R. OR t. e R >

1 I J I A



Example: R = R1 U R2; results in



R5



(SMITH ,


213-07-1666 ,


, 232-1500 ,


15) :


(MACAVOY,


621-49-2990 ,


356-5175 ,


15)


(JACOBY ,


413-00-0029 ,


484-7352 ,


6)


(SMITH ,


839-41-6942 ,


253-0410 ,


6)


(MADNICK,


217-61-7232 ,


253-6671 ,


15)



^R5 ' ^' "R5



Intersection: Suppose you wish to create a new relation whose

elements consisted of only the cotrmon elelemnts of two other relations.

The interseciton of those two relations would perfomr this task. Using

the symbol N we may formally define the intersection operator.

R^. = Rj N R^; (i = j = k is valid)

(note that if n. f n, , then R. = 0)
J k ^ '

R. = {t. : t. ^ t. } Duplicate t. are removed

I I ^1 K I



n. = n. = n .
1 k J



-23-
4_3^? Relational Operators [Codd: 1972]

Projection. Suppose you wish to create a new relation that consisted
of only some of the domains of an existing relation. Projection could do
this task. (Projection has its equivalence in prepositional calculus,
the existential qualifier [Church: 1943]. Formally, the projecttion 'P'
is defined

R. =R. P (d.^), ;i={l,2,...n.}

n. = lU)

c. = c. (Note: redundant entries not automatically removed - use
^ J the "compaction" operator for this purpose)

R. = (d.^ : £-{l,2,...n.}}

Example: k5 = R2 P (name, pnone); results in:

(MADNICK , 253-6671)

(SMITH , 232-1500)

(MACAVOY , 356-5175)

Join . Suppose you wished to create a new relation from two existing
relations such that each element of the new relation was the concatenation
of elements of the existing ones. Further, you only wanted to concatenate
those elements whose domains had certain properties. The joining of these
two relations would do it. Formally, the join '*' is defined (we find
the following definition more natural than Codd's):

R,. -Rj(d.,)*R,((0,d,J);

0::= > I < I = I ^0

Z e {1,2,. ..n^}

m e {1,2,. ..n^]

^"^ ^^i and d, must be of the same data type (i.e.,

must be joinable).

*Note; We have somewhat changed some of the definitions for implementation
and use reasons. E.g., we never automatically eliminate duplicate
rows in Projection; we do elimindtt uiie of the duplicate columns
in Join.



■24-



31


PEABODY),


23


BOSTON),


34


IPSWICH)}



n. = n. + n, - 1 (no duplication of join domain)

1 J K



Y = 1 ,. . .c . ; 6 = 1 ,. . .C|^}

Example (1): R6 = R2 (name) * R3 (=, person); results in:

(soc_sec_# phone dept_# name age city)

R6 = {(217-61-7232 253-6671 15 MADNICK

(213-07-1666 232-1500 15 SMITH


1 3

Online LibraryJohn J DonovanA hierarchical approach to information system design → online text (page 1 of 3)