M. P Reddy.

Towards an active schema integration architecture for heterogeneous database systems online

. (page 1 of 1)
Online LibraryM. P ReddyTowards an active schema integration architecture for heterogeneous database systems → online text (page 1 of 1)
Font size
QR-code for this ebook


i^z-v^MCO - UtWtT



3 9080 00932 7500

Towards an Active Schema Integration

Architecture for Heterogeneous

Database Systems

M.P. Reddy

Michael Siegel

Amar Gupta

WP#3768 April 1993
PROFIT #93-07

Productivity From Information Technology

"PROHT" Research Initiative

Sloan School of Management

Massachusetts Institute of Technology

Cambridge, MA 02139 USA


Fax: (617)258-7579

Copyright Massachusetts Institute of Technology 1993. The research described
herein has been supported (in whole or in part) by the Productivity From Information
Technology (PROFIT) Research Initiative at MIT. This copy is for the exclusive use of
PROFIT sponsor firms.

Productivity From Information Technology


The Productivity From Information Technology (PROFIT) Initiative was established
on October 23, 1992 by MIT President Charles Vest and Provost Mark Wrighton "to
study the use of information technology in both the private and public sectors and
to enhance productivity in areas ranging from finance to transportation, and from
manufacturing to telecommunications." At the time of its inception, PROFIT took
over the Composite Information Systems Laboratory and Handwritten Character
Recognition Laboratory. These two laboratories are now involved in research re-
lated to context mediation and imaging respectively. "''^''''''^Js^fTs/.^ic,

^^y 23 1995


In addition, PROFIT has undertaken joint efforts with a number of research centers,
laboratories, and programs at MIT, and the results of these efforts are documented
in Discussion Papers published by PROFIT and/or the collaborating MFT entity.

Correspondence can be addressed to:

The "PROFIT" Initiative
Room E5 3-3 10, MFT
50 Memorial Drive
Cambridge, MA 02142-1247
Tel: (617) 253-8584
Fax: (617) 258-7579
E-Mail: [email protected]

Towards an Active Schema Integration Architecture for
Heterogeneous Database Systems

M. P. Reddy, Michael Siegel, and Amar Gupta

Sloan School of Management

Massachusetts Institute of Technology

Cambridge, MA 02139

In this paper we describe our research in the development of a four-layered architecture for
Heterogeneous Distributed Database Management Systems (HDDBMS). The architecture includes
the locaJ schema, local object schema, global schema, and global view schema. This architecture
was developed to support the propagation of local database semantics (e.g., integrity constraints,
context) to the global schema and global view. Constraints propagated to the global level can be
used to derive new constraints that could not have been recognized by any of the local components.
These constraints are important in significantly reducing query processing costs in the HDDBMS
environment by permitting incorporation of techniques similar to semantic query optimization in
the single database environment [CFM84,HZ80,Kin81,SSS91]. These techniques are used on the
global query to identify candidate databases and reduce the number of required local databases.

So far local, global and view layers are considered to be defined by passive objects (i.e., without
methods). As a result, changes to the semantics at the local schema have to be manually propagated
to the global level in order to maintain a set of globally consistent integrity constraints. We are
currently investigating the use of active objects as components of our four-layer architecture capable
of triggering changes in the semantics to maintain a consistent set of global integrity constraints.

In Section 1, we summarize the key components of the four-layer architecture and describe the
derivation of global integrity constraints. In Section 2 we describe the role of semantic query pro-
cessing at the global level and compares this with existing semantic query optimization techniques.
Finally, in Section 3 we present our vision for the use of active objects to maintain the consistency
of the mapping knowledge and to maintain global integrity constraint consistency.

1 Integration Model

A methodology for designing a HDDBMS was proposed in [RPR89,Red90]; this methodology used a
four-layered schema architecture: local schemata, local object schemata, global schema, and global
view schema as shown in Figure 1. Each layer presents am integrated view of the concepts that
characterize the layer below.







n TJ




r Local ^ f

(schema- 1 J 1^








Figure 1: Schema Architecture of a four-layered HDDBMS

1.1 Local Schema

The bottom layer consists of a set of local database schemata. Each local database schema is
denoted by Di, where 'i' denotes the identification of the database. These schemata provide the
description of the data stored in their respective data models. The stored data can be retrieved
only by using their respective query languages.

1.2 Local Object Schema

One local object schema is constructed for each local schema. For a given Local Schema D,, the
construction of its corresponding Object Schema ODi involves the identification of the set 5, which
gives the distinct object types in the schema £>,, the semantic meaning of the data associated with
every instance of the object in 5,, and the constraints associated with these objects. The knowledge
that maps objects in 5, to their corresponding data structures in Di is also placed at this layer.

An object in 5, is any distinguishable entity whose description is available in the Local Schema
D,. A database object is denoted by 0/ where / is a unique object identifier: / consists of a pair of
indices, say (i.j), where the first index i specifies the schema identification and the second index
] provides the object identification within the schema. Each object possesses a set of properties.
A. property is denoted by Pk where A: is a unique property identifier; k is expressed as a pair I. pi
where / is its object identifier and pi is the property identifier with respect to the object 0/. The
Property Set associated with the object 0/ is denoted by PSo,- The object 0/ is characterized by
its properties. This characterization is denoted by: 0; PSo,. The key property of the object
0/ is denoted by Kt e PSo,-

A property can itself be characterized by a set of meta- properties. Meta-properties are the
parameters needed to provide a complete semantic meaiiing to the symbols associated with the
property. For example, PERIODICITY-OF- PAY and CURRENCY represent the meta-properties of
the property T-SAL.

Let M^'' denote the set of meta-properties associated with the property F^ and l^''*! denote
the number of meta-properties associated with Pk. For each meta-property there is a set of legal
meta-values. DOLLAR, RUPEE, and POUND are some of the meta-values for the meta-property
CURRENCY; similarly WEEKLY, MONTHLY, and YEARLY are some of the meta-values for the
meta-property PERIODICITY-OF-PAY. Further, if V^ is the meta- value of the property Pk associated
with the meta-property M'^, we define Ml(Pk) = V^'. These meta vaJues are used to recognize
semantic incompatibilities among the siniilar concepts in different layers.

1.3 Global Schema

The global schema is derived from the component local schemata. Objects in the component
scheraas are first pooled together and then decomposed into object equivalence classes comparing
their real world states [NEL86]. Two objects belonging to an equivalence class means they must
have the same real world states. Each object equivalence class gives one global object type. Further
each local object in an object equivalence class constitutes a component of the global object derived
from the object equivalence class. If Ol is a global object and Ot is its component, then we denote
this relation as 0; 6 Ol.

To compute the properties of a global object, we compute the union of the properties of ail
its components and decompose this union into a number of property equivalence classes where
each property equivalence class provides one property for the global object. All properties in one
property equivalence class are called components of the global property derived from the particular
property equivalence class. If Pl is a global property and Pi is its component, then we denote
this relation as P/ 6 Pl- The semantic meaning for a global property is fixed by defining all the
meta-values to the respective meta-properties. Two transformation maps T/ l a-nd ^L,/ ^'^ defined
which mcike Pi semanticaJly compatible to Pl and Pl semantically compatible to Pi respectively.

Two properties Pi, and Pl are said to be meta- value compatible with respect to the meta-
property M' if and only if Af'(P/) = M'(Pl), that is, if and only if V/ = V^, and this compatibility
is denoted by:

If T-SAL is the monthly salary paid in rupees and FAC-P is the annual salary paid in dollars, these
two properties are not meta- value compatible with respect to PERIODICITY-OF-PAY or CUR-
• Transformation Map

If a property P; is not meta-value compatible with Pl with respect to the meta-property .\P .
then it is possible to define a transformation map tp^'p which makes Pi meta-value compatible

with Pl with respect to the meta-property M-'. Note that t'p^'p may be a look-up table.

In the above example, F-PAY is not compatible with T-SAL with respect to the meta-property CUR-
RENCY. The meta-value compatibility can be obtained with the transformation map tj_sAL.F-PAY

As such

h-SAL.F-PAY\^ -^^^1 I- -PAY

Here t^^sAL.F-PAYiT-SAL) is ^ times T-SAL, assuming $1 = Rupees 24.

• Composite Transformation Map

Two properties Pi and Pl in [Pk] are defined to be semanticaJly compatible with each other
if and only if they have meta-value compatibility with respect to all meta- properties pertinent
to these properties. This is symbolically denoted by P; ~ Pl. Further, if P; and Pl are not
semanticzdly compatible, then the composite transformation map Tp^p can be defined which
makes Pi semantically compatible with PL-
Suppose 'p p. >^p p 7 • • • i^p p *re the transformation maps which make Pi meta-value com-

patible with Pl with respect to the meta properties M^,M^,... ,A/'^ ''I respectively. The trans-
formation map can be defined as follows:

Tr„P^{Pi) = ('}'„Pl°'p„Pl° - -°'^Pl^(^'^

= '20K










All professors musl
gel more than 250K
Rupees per month.







All faculty who are earning
more than lOOK Dollars
per year must be given
office type A

Figure 2: Example Derivation of Global Integrity Constraints

constraints shown at the local level are propagated to the global level [RPG92] and used to derive
new global constraints. For example, the constraint FAC-RANK = 'Professor' — FAC-OFFICE-
TYPE = 'A' can only be derived at the global level.

The above discussion shows the meaningful interaction among different layers depends on the
semantics of the similar concepts in different layers and in turn depends on the correctness of
the composite transformation maps defined between these layers. Our previous work suggest that
these composite transformation maps need to be redefined manually whenever the semantics of the
concepts present in these layers change.

1.4 View Object Schema

Some of the objects in the third layer may possess disjoint or overlapping domains. The integration
of these objects may be required for global users, creating a need for generalizing such objects to
produce global views. Each of the global objects that is generalized to produce the global view is
called the component of the view object.

The properties of the global view object Jire derived by first computing the union of the prop-
erties of the component objects. This union is decomposed into property equivalence classes; from
these we create a subset retaining a property equivalence class only if it contains one property from
each and every component of the view object. Each such property equivalence class provides one
property for the global object.

The following section outlines the potential benefits of the global integrity constraints and the
need to maintain their consistency.

2 Using GICs in Semantic Query Processing

In [RSG92] we describe algorithm for using GICs in semantic query processing. Significant sav-
ings can occur using semantic query processing for global queries. Some of the key optimization
techniques introduced in our GlC-based query processing strategy are:

• Null Queries: Rejection of null global queries at the initial stage would reduce the average
query response time. Null queries are typically entered by users who do not possess adequate
knowledge about explicit and implicit relations among the objects/entities. This is especially
true in a HDDBMS environment where the global schema is generally large and difficult for
the user to understand completely.

• Deduction of Query Results: SQP facilitate deduction of values of target attributes using
available semantic knowledge and query qualification. The deduction of all target attributes
may result in answering complete queries. Even when all the target properties may not be
deducible using semantic knowledge, the deduction of a subset of the target properties may
eliminates the need for the generation of one or more subqueries.

• Avoidance of Large Search Space: Because the search space comprises of the union of
all the component databases, the time to process global queries may exceed an acceptable
range. The need for an exhaustive search of aJl the component databases can be avoided
by implementing a sophisticated query optimization strategy. SQP techniques can reduce
the size of the relevant search space by selecting an appropriate minimal set of candidate

• Optimization of Subqueries: Semantic query processing does not terminate at the global
schema level after optimizing the global query. Subqueries of the global query need to be
optimized further by using additional Semantic Query Optimization techniques.

• Generation of Missing Data: One of the problems faced during the integration of par-
tial results is that of missing data. This problem arises because of incompleteness of the
component databases. This problem may be resolved using GICs.

• Resolution of Data Inconsistencies: Data inconsistency is another problem which de-
mands solution during the stage of integration of partial results obtained by processing sub-
queries agadnst their respective databases. This problem axises because of uncontrolled re-
dundancy inherent in heterogeneous environments. Semantic knowledge can be utilized to
overcome this problem.

This semantic query processing concept requires a set of consistent global integrity constraints.
However, changes in local database semantics is not easily reflected in the structure or semantic
knowledge at the global level. In the following section we provide some insight into how a more
active architecture may be able to provide a consistent global representation.






Figure 3: Architecture of the proposed HDDBMS

3 Maintaining Consistent Global Integrity Constraints

Query processing in a HDDMS can be improved using GICs. This is contingent upon the availability
of a consistent set of GICs. Since these constraints are derived from the local constraints, any change
in the semantics of the local schema impacts the set of local integrity constraints associated with
that schema. The corresponding change must be reflected in the GICs. Currently, objects in the
local object schema and in the global schema are passive, in the sense that they contain no methods
and must be redefined whenever there is any chainge in the local schema. Our plan is to make these
objects active, in the sense that whenever some change occurs in the local schema, the objects in
the top three layers evolve to cope with the change at the local schema. For example, consider
the system architecture shown in Figure 3. In the "passive world", a local database administrator
would contact the global administrator to register a change in the local schema, and the global
administrator would change the local object schema and all other layers and distribute new copies
of the global schema to the user sites. In the "active world", the changes in the local schema would
be reflected in the local object schema and inconsistencies would be identified and, when possible,
a consistent global schema could be automatically produced.

The schema evc^ution process has been studied in the context of object oriented databases [BCG*90).
This work mainly concentrates on the schema evolution process (i) changes to the contents of an
object class (e.g., changes to an instance variable or method); (ii) changes to relations among the ob-
ject classes; (iii) addition or removal of object classes from the schema. Because incremental growth
is one of the desired features of a HDDBMS, such schema evolutions is applicable to HDDBMS.
Automatic schema evolution makes it easier to add a new database to an existing HDDBMS. How-
ever, existing evolution mechanisms are not adequate for our requirement. Whenever auiy change
occurs in the semantics of an attribute in the local schema that the change must be reflected in
all transformation maps pertinent to that attribute in diflTerent layers; further the corresponding
LICs and GICs must be modified. We are currently investigating methods that make objects in

different layers active, so that they can be used in our four layered architecture to generate current
composite transformation maps, and to generate consistent global integrity constraints.

Some examples of the uses for active objects include the identification of invalid instances of both
transformation mappings and global constraints. The layers of the architecture, tramsformation
maps and global constraints can be provided with methods or message passing capabilities that
allow for notification of changes in these object states. For example, assume that the constraint
in Figure 2 on the local FACULTY relation is changed so that only those faculty members whose
salary is more than 150K Dollars per year will get office-type 'A'. This situation requires that one
of the previously generated GICs be made invalid and a new GIC must be generated in its place.
We proposed to generate GICs and define demons to monitor the changes in its component LICs.
Whenever there is a change in one of the components these demons invoke a method to reconstruct
the GIC suitable to the local changes. If the semantics of F-PAY are changed so that it gives
annual salaries in Rupees, then the corresponding transformation map is required to be changed.
This method for constructing the composite transformation map may access global ontologies or
conversion routine libraries. If such automatic construction is not possible, then we would want
the system designer to be automatically notified of the impact of these changes.

The four-layered architecture provides a well-defined set of integration stages. We believe that
enhancing this network with active capabilities will allow for automatic recognition and resolution
of conflicts that resolve from changes in the semantics at the local schema.


[BCG*90) J. Banerjee, H. T. Chou, J. F. Garza, W. Kim, D. Woelk, N. Ballou, and H. J. Kim.
Data model issues for object-oriented applications. In S. B. Zdonik and D. Maier, edi-
tors. Readings in Object-Oriented Database Systems, pages 161-213, Morgan Kaufmann
Publishers, Inc, 1990.

[CFM84] U. Chakravarthy, D. Fishman, and J. Minker. Semantic query optimization in expert
systems and database systems. In Proceedings of the First Intl. Conference on Expert
Database Systems, pages 326-340, 1984.

[HZ80) M. Hammer and S. Zdonik. Knowledge-based query processing. In Proceedings 6th
VLDB, pages 137-146, 1980.

[KinSl] J. King. QUIST : A system for semantic query optimization in relational databases. In
Proceedings 7th VLDB, pages 510-517, 1981.

[NEL86] S. B. Navathe, R. Elmasri, and J. Larson. Integrating user views in database design.
Computer, 19, 1986.

[Red90) M. P. Reddy. Heterogeneous Distributed Database Management Systems: Modeling
and Managing Heterogeneous Data. PhD thesis. School of Mathematics & Com
puter/Information Science, University of Hyderabad, India, 1990.

[RPG92] M. P. Reddy, B. E. Prasad, and A. Gupta. Formulation gTobal integrity constraints
during derivation of global schema. In Submittion to Knowledge and Data Engineering,
1992. '■— ^

[RPR89] M. P. Reddy, B. E. Prasad, and P. G. Reddy. A methodology for resolving semantic
incompatibilities and data inconsistencies in integrating heterogeneous databases. In In
Proc. Int. Conference on Management of Data, Hyderabad, India, 1989. — '



M. P. Reddy, M. Siegel, and A. Gupta, ^^mantic query processing in hddbms. In
submission to VLDB Journal, 1992. V

M. Siegel, S. Salveter, and E. Sciore. _ Automatic rule derivation for semantic query
optimization. Accepted for publicatior^ to ^Transactions on Database Systems, 1991.



3 9080 00932 7500

197 '9

Date Due



Online LibraryM. P ReddyTowards an active schema integration architecture for heterogeneous database systems → online text (page 1 of 1)