Abdelguerfi, M., Eskicioglu, R., Liebowitz, J. “Knowledge Engineering”
The Electrical Engineering Handbook
Ed. Richard C. Dorf
Boca Raton: CRC Press LLC, 2000
94
Knowledge Engineering
94.1 Databases
Database Abstraction?Data Models?Relational
Databases?Hierarchical Databases?Network
Databases?Architecture of a DBMS?Data Integrity and
Security?Emerging Trends
94.2 Rule-Based Expert Systems
Problem Selection?Knowledge Acquisition?Knowledge
Representation?Knowledge Encoding?Knowledge Testing and
Evaluation?Implementation and Maintenance
94.1 Databases
M. Abdelguerfi and R. Eskicioglu
In the past, file processing techniques were used to design information systems. These systems usually consist
of a set of files and a collection of application programs. Permanent records are stored in the files, and application
programs are used to update and query the files. The application programs were in general developed individ-
ually to meet the needs of different groups of users. In many cases, this approach leads to a duplication of data
among the files of different users. Also, the lack of coordination between files belonging to different users often
leads to a lack of data consistency. In addition, changes to the underlying data requirements usually necessitate
major changes to existing application programs. Among other major problems that arise with the use of file
processing techniques are lack of data sharing, reduced programming productivity, and increased program
maintenance. Because of their inherent difficulties and lack of flexibility, file processing techniques have lost a
great deal of their popularity and are being replaced by database management systems (DBMS).
A DBMS is designed to efficiently manage a shared pool of interrelated data (database). This includes the
existence of features such as a data definition language for the definition of the logical structure of the database
(database schema), a data manipulation language to query and update the database, a concurrency control
mechanism to keep the database consistent when shared by several users, a crash recovery strategy to avoid any
loss of information after a system crash, and safety mechanisms against any unauthorized access.
Database Abstraction
A DBMS is expected to provide for data independence, i.e., user requests are made at a logical level without any
need for the knowledge of how the data is stored in actual files. This implies that the internal file structure
could be modified without any change to the user’s perception of the database. To achieve data independence,
the Standards Planning and Requirements Committee (SPARC) of the American National Standards Institute
(ANSI) in its 1977 report recommended three levels of database abstraction (see Fig. 94.1). The lowest level in
the abstraction is the internal level. Here, the database is viewed as a collection of files organized according to
one of several possible internal data organizations (e.g., B
+
-tree data organization). In the conceptual level, the
database is viewed at an abstract level. The user at this level is shielded from the internal storage details. At the
external level, each group of users has their own perception or view of the database. Each view is derived from
M. Abdelguerfi and
R. Eskicioglu
University of New Orleans
Jay Liebowitz
George Washington University
? 2000 by CRC Press LLC
the conceptual database and is designed to meet the needs of a particular
group of users. Such a group can only have access to the data specified by
its particular view. This, of course, ensures both privacy and security.
The mapping between the three levels of abstraction is the task of the
DBMS. When changes to the internal level (such as a change in file organi-
zation) do not affect the conceptual and external levels, the system is said to
provide for physical data independence. Logical data independence prevents
changes to the conceptual level to affect users’ views. Both types of data
independence are desired features in a database system.
Data Models
A data model refers to an integrated set of tools used to describe the data
and its structure, data relationships, and data constraints. Some data models
provide a set of operators that is used to update and query the database. Data
models can be classified in two main categories: record-based and object-
based. Both classes are used to describe the database at the conceptual and external levels. With object-based
data models, constraints on the data can be specified more explicitly.
There are three main record-based data models: the relational, network, and hierarchical models. In the
relational model, data at the conceptual level is represented as a collection of interrelated tables. These tables
are normalized so as to minimize data redundancy and update anomalies. In this model, data relationships are
implicit and are derived by matching columns in tables. In the hierarchical and network models, the data is
represented as a collection of records and data relationships are explicit and are represented by links. The
difference between the last two models is that in the hierarchical model, data is represented as a tree structure,
while it is represented as a generalized graph in the network model.
In hierarchical and network models, the existence of physical pointers (links) to link related records allows
an application program to retrieve a single record at a time by following the pointer’s chain. The process of
following the pointer’s chain and selecting one record at a time is referred to as navigation. In nonnavigational
models such as the relational model, records are not related through pointer’s chains, but relationships are
established by matching columns in different tables.
The hierarchical and network models require the application programmer to be aware of the internal structure
of the database. The relational model, on the other hand, allows for a high degree of physical and logical data
independence. Earlier DBMSs were for the most part navigational systems. Because of its simplicity and strong
theoretical foundations, the relational model has since received wide acceptance. Today, most DBMSs are based
on the relational model.
Other data models include a popular high level conceptual data model, known as the Entity-Relationship
(ER) model. The ER model is mainly used for the conceptual design of databases and their applications. The
ER model describes data as entities, attributes, and relationships.
An entity is an “object” in the real world with an independent existence. Each entity has a set of properties,
called attributes, that describes it. A relationship is an association between entities. For example, a professor
entity may be described by its name, age, and salary and can be associated with a department entity by the
relationship “works for”.
With the advent of advanced database applications, the ER modeling concepts became insufficient. This has
led to the enhancement of the ER model with additional concepts, such as generalization, categories, and
inheritance, leading to the Enhanced-ER or EER model.
Relational Databases
The relational model was introduced by E. F. Codd [1970]. Since the theoretical underpinnings of the relational
model have been well defined, it has become the focus of most commercial DBMSs.
In the relational model, the data is represented as a collection of relations. To a large extent, each relation
can be thought of as a table. The example of Fig. 94.2 shows part of a university database composed of two
FIGURE 94.1Data abstraction.
? 2000 by CRC Press LLC
relations. FAC_INFO gives personal information (last name, social security, street and city of residence, and
department) of a faculty. DEP_CHAIR gives the last name of the chairman of each department. A faculty is
not allowed to belong to two departments. Each row in a relation is referred to as a tuple. A column name is
called an attribute name. The data type of each attribute name is known as its domain. A relation scheme is a
set of attribute names. For instance, the relation scheme (or scheme for short) of the relation FAC_INFO
is
(lname, social_sec#, street, city, dept). A key is a set of attribute names whose composite value is distinct for
all tuples. In addition, no proper subset of the key is allowed to have this property. It is not unusual for a
scheme to have several possible keys. In FAC_INFO, both lname and social_sec# are possible keys. In this case,
each possible key is known as a candidate key, and the one selected to act as the relation’s key, say, lname, is
referred to as the primary key. A superkey is a key with the exception that there is no requirement for minimality.
In a relation, an attribute name (or a set of attribute names) is referred to as a foreign key, if it is the primary
key of another relation. In FAC_INFO, the attribute name dept is a foreign key, since the same attribute is a
key in DEP_CHAIR. Because of updates to the database, the content of a relation is dynamic. For this reason,
the data in a relation at a given time instant is called an instance of the relation.
There are three integrity constraints that are usually imposed on each instance of a relation: primary key
integrity, entity integrity, and referential integrity. The key integrity constraint requires that no two tuples of
a relation have the same key value. The entity integrity constraint specifies that the key value of each tuple
should have a known value (i.e., no null values are allowed for primary keys). The referential integrity constraint
specifies that if a relation r
1
contains a foreign key that matches the primary key of a relation r
2
, then each value
of the foreign key in r
1
must either match a value of the primary key in r
2
or must be null. For the database of
Fig. 94.2 to be consistent, each value of dept in FAC_INFO must match a value of dept in DEP_CHAIR.
Relational Database Design
The relational database design [Maier, 1983] refers to the process of generating a set of relation schemes that
minimizes data redundancy and removes update anomalies. One of the most popular approaches is the use of
the normalization theory. The normalization theory is based on the notion of functional dependencies.
Functional dependencies are constraints imposed on a database. The notion of superkey, introduced in the
previous section, can be formulated as follows: A subset of a relation scheme is a superkey if, in any instance
of the relation, no two distinct tuples have the same superkey value. If r(R) is used to denote a relation r on a
schema R, K í R a superkey, and t(k) the K-value of tuple t, then no two tuples t
1
and t
2
in r(R) are such that
t
1
(K) = t
2
(K).
The notion of a functional dependency can be seen as a generalization of the notion of superkey. Let X and
Y be two subsets of R; the functional dependency X ? Y exists in r(R) if whenever two tuples in r(R) have
the same X-value, their Y-value is also the same. That is, if t
1
(X) = t
2
(X), then t
1
(Y) = t
2
(Y). Using functional
dependencies, one can define the notion of a key more precisely. A key k of a relation r (R) is such that k ? R
and no proper subset of k has this property. Note that if the schema R is composed of attribute names {A
1
, A
2
,
. . ., A
n
}, then each attribute name A
i
is functionally determined by the key k, i.e., k ? A
i
, i = 1, . . ., n. An
FIGURE 94.2 An example of two relations: FAC_INFO
and DEP_CHAIR.
? 2000 by CRC Press LLC
attribute name that is part of a key is referred to as a prime attribute. In the example of Fig. 94.2, both attribute
names street and city are nonprime attributes.
The normalization process can be thought of as the process of decomposing a scheme with update anomalies
and data redundancy into smaller schemes in which these undesirable properties are to a large extent eliminated.
Depending on the severity of these undesirable properties, schemes are classified into normal forms. Originally,
Codd defined three normal forms: first normal form (1NF), second normal form (2NF), and third normal form
(3NF). Thereafter, a stronger version of the 3NF, known as Boyce-Codd normal form (BCNF), was suggested.
These four normal forms are based on the concept of functional dependencies.
The 1NF requires that attribute name values be atomic. That is, composite values for attribute names are
not allowed. A 2NF scheme is a 1NF scheme in which all nonprime attributes are fully dependent on the key.
Consider the relation of Fig. 94.3. Each tuple in PRODUCT gives the name of a supplier, a product name, its
price, and the supplier’s location. The scheme (supplier_name, product_name, price, quantity) is in 1NF since
each attribute name is atomic. It is assumed that many products can be supplied by a single supplier, that a
given product can be supplied by more than one supplier, and that a supplier has only one location. So,
(supplier_name, product_name) is the relation’s key and the functional dependency supplier_name ? location
should hold for any instance of PRODUCT.
The structure of the relation of Fig. 94.3 does not allow a supplier to appear in the relation unless it offers
at least one product. Even the use of null values is not of much help in this case as product_name is part of a
key and therefore cannot be assigned a null value. Another anomaly can be encountered during the deletion
process. For instance, deleting the last tuple in the relation results in the loss of the information that Rudd is
a supplier located in Metairie. It is seen that the relation PRODUCT suffers from insertion and deletion
anomalies.
Modifications can also be a problem in the relation PRODUCT. Suppose that the location of the supplier
Martin is moved from Kenner to Slidell. In order not to violate the functional dependency supplier_name ?
location, the location attribute name of all tuples where the supplier is Martin needs to be changed from Kenner
to Slidell. This modification anomaly has a negative effect on performance.
In addition, the relation PRODUCT suffers from data redundancy. For example, although Martin has only
one location “Kenner”, such a location appears in all three tuples where the supplier_name is Martin.
The update anomalies and data redundancy encountered in PRODUCT are all due to the functional depen-
dency supplier_name ? location. The right-hand side of this dependency “location” is a nonprime attribute,
and the left-hand side represents part of the key. Therefore, we have a nonprime attribute that is only partially
dependent on the key (supplier_name, product_name). As a consequence, the schema (supplier_name,
product_name, price, location) is not in 2NF. The removal of the partial dependency supplier_name ? location
will eliminate all the above anomalies. The removal of the partial dependency is achieved by decomposing the
scheme (supplier_name, product_name, price, quantity) into two 2NF schemes: (supplier_name,
product_name, price), and (supplier_name, location). This decomposition results in relations PRO_INFO and
SUP_LOC shown in Fig. 94.4. The keys of PRO_INFO and SUP_LOC are (supplier_name, product_name),
and supplier_name, respectively.
Normalizing schemes into 2NF removes all update anomalies due to nonprime attributes being partially
dependent on keys. Anomalies of a different nature, however, are still possible.
Update anomalies and data redundancy can originate from transitive dependencies. A nonprime attribute A
i
is said to be transitively dependent on a key k via attribute name A
j
, if k ? A
j
, A
j
? A
i
, and A
j
does not
functionally determine A
k
. A 3NF is a 1NF where no nonprime attribute is transitively dependent on a key.
FIGURE 94.3Instance of PRODUCT (supplier_name, product_name, price, quantity).
? 2000 by CRC Press LLC
The relation of Fig. 94.5, which is in 2NF, highlights update anomalies and data redundancy due to the
transitive dependency of a nonprime attribute on a key. The relation gives the name of a client (client_name),
the corresponding supplier (supplier_name), and the supplier’s location. Each client is assumed to have one
supplier. The relation’s key is client_name, and each supplier has only one location. A supplier and his location
cannot be inserted in SUPPLIES unless the supplier has at least one client. In addition, the relation has a
deletion anomaly since if Tillis is no longer a client of Rudd, the information about Rudd as a supplier and his
location is lost. A change to a supplier’s location may require updating the location attribute name of several
tuples in the relation. Also, although each supplier has only one location, such a location is sometimes repeated
several time unnecessarily, leading to data redundancy.
The relation exhibits the following transitive dependency: client_name ? supplier_name, supplier_name
? location (but not the inverse). The relation CLIENT is clearly in 2NF, but because of the transitive dependency
of the nonprime attribute location on the key, it is not in 3NF. This is the cause of the anomalies mentioned
above. Eliminating this transitive dependency by splitting the schema into two components will remove these
anomalies. Clearly, the resulting two relations SUP_CLI and SUP_LOC are in 3NF (see Fig. 94.6).
Each partial dependency of a nonprime attribute on a key can be expressed as a transitive dependency of a
nonprime attribute on a key. Therefore, a scheme in 3NF is also in 2NF.
BCNF is a stricter form of 3NF, where a relation r on a schema R is in BCNF if whenever a functional
dependency X ? Y exists in r(R), then X is a superkey of R. The condition of 3NF, which allows Y to be prime
if X is not a superkey, does not exist in BCNF. Thus, every scheme in BCNF is also in 3NF, but the opposite is
not always true.
FIGURE 94.4Decomposition of PRODUCT into PRO_INFO and SUP_LOC.
FIGURE 94.5Instance of SUPPLIES.
FIGURE 94.6Decomposition of SUPPLIES into SUP_CLI and SUP_LOC.
? 2000 by CRC Press LLC
A detailed discussion of higher level normalizations, such as 4NF and 5NF, which are based on other forms
of dependencies, can be found in [Elmasri and Navathe, 1994].
Data Definition and Manipulation in Relational Databases
Upon completion of the relational database design, a descriptive language, usually referred to as Data Definition
Language (DDL), is used to define the designed schemes and their relationships. The DDL can be used to create
new schemes or modify existing ones, but it cannot be used to query the database. Once DDL statements are
compiled, they are stored in the data dictionary. A data dictionary is a repository where information about
database schemas, such as attribute names, indexes, and integrity constraints are stored. Data dictionaries also
contain other information about databases, such as design decisions, usage standards, application program
descriptions, and user information. During the processing of a query, the DBMS usually checks the data
dictionary. The data dictionary can be seen as a relational database of its own. As a result, data manipulation
languages that are used to manipulate databases can also be used to query the data dictionary.
An important function of a DBMS is to provide a Data Manipulation Language (DML) with which a user
can retrieve, change, insert, and delete data from the database. DMLs are classified into two types: procedural
and nonprocedural. The main difference between the two types is that in procedural DMLs, a user has to specify
the desired data and how to obtain it, while in nonprocedural DMLs, a user has only to describe the desired
data. Because they impose less burden on the user, nonprocedural DMLs are normally easier to learn and use.
The component of a DML that deals with data retrieval is referred to as query language. A query language
can be used interactively in a stand-alone manner, or it can be embedded in a general-purpose programming
language such as C and Cobol.
One of the most popular query languages is SQL (Structured Query Language). SQL is a query language
based to a large extent on Codd’s relational algebra. SQL has additional features for data definition and update.
Therefore, SQL is a comprehensive relational database language that includes both a DDL and DML.
SQL includes the following commands for data definition: CREATE TABLE, DROP TABLE, and ALTER
TABLE. The CREATE TABLE is used to create and describe a new relation. The two relations of Fig. 94.4 can
be created in the following manner:
CREATE TABLE PRO_INFO (supplier_name VARCHAR(12) NOT NULL,
product_name VARCHAR(8) NOT NULL,
price DECIMAL(6,2));
CREATE TABLE SUP_LOC (supplier_name VARCHAR(12) NOT NULL,
location VARCHAR(10));
The CREATE TABLE command specifies all the attribute names of a relation and their data types (e.g.,
INTEGER, DECIMAL, fixed length character “CHAR”, variable length character “VARCHAR”, DATE). The
constraint NOT NULL is usually specified for those attributes that cannot have null values. The primary key
of each relation in the database is usually required to have a nonnull value.
If a relation is created incorrectly, it can be deleted using the DROP TABLE command. The command is
DROP TABLE followed by the name of the relation to be deleted. A variation of DROP command, DROP
SCHEMA, is used if the whole schema is no longer needed.
The ALTER TABLE is used to add new attribute names to an existing relation, as follows:
ALTER TABLE SUP_LOC ADD zip_code CHAR(5);
The SUP_LOC relation now contains an extra attribute name, zip_code. In most DBMSs, the zip_code value
of existing tuples will automatically be assigned a null value. Other DBMSs allow for the assignment of an
initial value to a newly added attribute name. Also, definitions of attributes can be changed and new constraints
can be added, or current constraints can be dropped.
The DML component of SQL has one basic query statement, sometimes called a mapping, that has the
following structure:
SELECT <attribute_name list>
FROM <relation_list>
WHERE <restriction>
? 2000 by CRC Press LLC
In the above statement, the SELECT clause specifies the attribute names that are to be retrieved, FROM gives
the list of the relations involved, and WHERE is a Boolean predicate that completely specifies the tuples to be
retrieved.
Consider the database of Fig. 94.4, and suppose that we want the name of all suppliers that supply either
beds or desks. In SQL, this query can be expressed as:
SELECT supplier_name
FROM PRO_INFO
WHERE product_name = òbedó OR product_name = òsofaó
The result of an SQL command may contain duplicate values and is therefore not always a true relation. In
fact, the result of the above query, shown below, has duplicate entries.
supplier_name
Martin
Martin
Rudd
The entry Martin appears twice in the result, because the supplier Martin supplies both beds and sofas.
Removal of duplicates is usually a computationally intensive operation. As a result, duplicate entries are not
automatically removed by SQL. To ensure uniqueness, the command DISTINCT should be used. In the above
query, if we want the supplier names to be listed only once, the above query should be modified as follows:
SELECT DISTINCT supplier_name
FROM PRO_INFO
WHERE product_name = òbedó OR product_name = òsofaó
In SQL, a query can involve more than one relation. Suppose that we want the list of all suppliers from
Metairie who supply beds. Such a query, shown below, involves both PRO_INFO and SUP_LOC.
SELECT supplier_name
FROM PRO_INFO, SUP_LOC
WHERE PRO_INFO.supplier_name = SUP_LOC.supplier_name
AND product_name = òbedó
When an SQL expression, such as the one above, involves more than one relation, it is sometimes necessary
to qualify attribute names, that is, to precede an attribute name by the relation (a period is placed between the
two) it belongs to. Such a qualification removes possible ambiguities.
In SQL, it is possible to have several levels of query nesting; this is done by including a SELECT query
statement within the WHERE clause.
The output data can be presented in sorted order by using the SQL ORDER BY clause followed by the
attribute name(s) according to which the output is to be sorted.
In database management applications it is often desirable to categorize the tuples of a relation by the values
of a set of attributes and extract an aggregated characteristic of each category. Such database management tasks
are referred to as aggregation functions. For instance, SQL includes the following built-in aggregation functions:
SUM, COUNT, AVERAGE, MIN, MAX. The attribute names used for the categorization are referred to as
FIGURE 94.7 Instance of the relation PROFESSOR.
? 2000 by CRC Press LLC
GROUP BY columns. Consider the relation PROFESSOR of Fig. 94.7. Each tuple of the above relation gives
the name of a faculty and his department and academic year salary.
Suppose that we want to know the number of faculty in each department and the result to be ordered by
department. This query requests for each department a count of the number of faculty. Faculty are therefore
categorized according to the attribute name department. As a result, department is referred to as a GROUP BY
attribute. In SQL, the above query is formulated as follows:
SELECT department, COUNT (faculty)
FROM PROFESSOR
GROUP BY department
ORDER BY department
The result of applying the COUNT aggregation function is a new relation with two attribute names. They are
a GROUP BY attribute (department in this case) and a new attribute called COUNT. The tuples are ordered
lexicographically in ascending order according to the ORDER BY attribute, which is department in this case:
department COUNT (faculty)
Computer Sc. 4
Electrical Eng. 3
Mechanical Eng. 2
The relations created through the CREATE TABLE command are known as base relations. A base relation
exists physically and is stored as a file by the DBMS. SQL can be used to create views using the CREATE VIEW
command. In contrast to base relations, the creation of a view results in a virtual relation, that is, one that does
not necessarily correspond to a physical file. Consider the database of Fig. 94.4, and suppose that we want to
create a view giving the name of all suppliers located in Metairie, the products each one provides, and the
corresponding prices. Such a view, called METAIRIE_SUPPLIER, can be created as follows:
CREATE VIEW METAIRIE_SUPPLIER
AS SELECT PRO_INFO.supplier_name, product_name, price
FROM PRO_INFO, SUP_LOC
WHERE PRO_INFO.supplier_name = SUP_LOC.supplier_name
AND location = òMetairieó
Because a view is a virtual relation that can be constructed from one or more relations, updating a view may
lead to ambiguities. As a result, when a view is generated from more than one relation, there are, in general,
restrictions on updating such a view.
Hierarchical Databases
The hierarchical data model [Elmasri and Navathe, 1994] uses a tree data structure to conceptualize associations
between different record types. In this model, record types are represented as nodes and associations as links.
Each record type, except the root, has only one parent; that is, only parent-child (or one-to-many) relationships
are allowed. This restriction gives hierarchical databases their simplicity. Since links are only one way, from a
parent to a child, the design of hierarchical database management systems is made simpler, and only a small
set of data manipulation commands are needed.
Because only parent-child relationships are allowed, the hierarchical model cannot efficiently represent two
main types of relationships: many-to-many relationships and the case where a record type is a child in more
than one hierarchical schema. These two restrictions can be handled by allowing redundant record instances.
However, such a duplication requires that all the copies of the same record should be kept consistent at all times.
The example of Fig. 94.8 shows a hierarchical schema. The schema gives the relationship between a DEPART-
MENT, its employees (D_EMPLOYEE), the projects (D_PROJECT) handled by the different departments, and
how employees are assigned to these projects. It is assumed that an employee belongs to only one department,
a project is handled by only one department, and an employee can be assigned to several projects. Notice that
since a project has several employees assigned to it, and an employee can be assigned to more than one project,
the relationship between D_PROJECT and D_EMPLOYEE is many-to-many. To model this relationship mul-
tiple instances of the same record type D-EMPLOYEE may appear under different projects.
? 2000 by CRC Press LLC
Such redundancies can be reduced to a large extent through the use of logical links. A logical link associates
a virtual record from a hierarchical schema with an actual record from either the same schema or another
schema. The redundant copy of the actual record is therefore replaced by a virtual record, which is nothing
more than a pointer to the actual one.
Hierarchical DLLs are used by a designer to declare the different hierarchical schemas, record types, and
logical links. Furthermore, a root node must be declared for each hierarchical schema, and each record type
declaration must also specify the parent record type.
Unlike relational DMLs, hierarchical DMLs such as DL/1 are record at-a-time languages. DL/1 is used by
IBM’s IMS hierarchical DBMS. In DL/1 a tree traversal is based on a preorder algorithm, and within each tree,
the last record accessed through a DL/1 command can be located through a currency indicator.
Retrieval commands are of three types:
GET UNIQUE <record type> WHERE <restrictions>
Such a command retrieves the leftmost record that meets the imposed restrictions. The search always starts
at the root of the tree pointed to by the currency indicator.
GET NEXT [<record type> WHERE <restrictions>]
Starting from the current position, this command uses the preodrer algorithm to retrieve the next record
that satisfies the restrictions. The clause enclosed between brackets is optional. GET NEXT is used to retrieve
the next (preorder) record from the current position.
GET NEXT WITHIN PARENT [<record type> WHERE <restrictions>]
It retrieves all records that have the same parent and that satisfy the restrictions. The parent is assumed to
have been selected through a previous GET command.
Four commands are used for record updates:
INSERT
Stores a new record and links it to a parent. The parent has been already selected through a GET command.
REPLACE
The current record (selected through a previous GET command) is modified.
DELETE
The current record and all its descendants are deleted.
GET HOLD
Locks the current record while it is being modified.
The DL/1 commands are usually embedded in a general-purpose (host) language. In this case, a record
accessed through a DL/1 command is assigned to a program variable.
Network Databases
In the network model [Elmasri and Navathe, 1994] associations between record types are less restrictive than
with the hierarchy model. Here, associations among record types are represented as graphs.
One-to-one and one-to-many relationships are described using the notion of set type. Each set type has an owner
record type and a member record type. In the example of Fig. 94.8, the relationship between DEPARTMENT and
FIGURE 94.8 A hierarchical schema.
? 2000 by CRC Press LLC
employee (D_EMPLOYEE) is one-to-many. This relationship defines a set type where the owner record type
is DEPARTMENT and the member record type is D_EMPLOYEE. Each instance of an owner record type along
with all the corresponding member records represents a set instance of the underlying set type. In practice, a
set is commonly implemented using a circular-linked list which allows an owner record to be linked to all its
member records. The pointer associated with the owner record is known as the FIRST pointer, and the one
associated with a member record is known as a NEXT pointer.
In general, a record type cannot be both the owner and a member of the same set type. Also, a record cannot
exist in more than one instance of a specific set type. The latter requirement implies that many-to-many
relationships are not directly implemented in the network data model.
The relationship between D_PROJECT and D-EMPLOYEE is many-to-many. In the network model, this
relationship is represented by two set types and an intermediate record type. The new record type could be
named ASSIGNED_TO (see Fig. 94.9). One set has D_EMPLOYEE as owner and ASSIGNED_TO as member
record type, and the other has D_PROJECT as owner and ASSIGNED_TO as member record type.
Standards for the network model’s DDL and DML were originally proposed by the CODASYL (Conference
On Data SYstems Languages) committee in 1971. Several revisions to the original proposal were made later.
In a network DDL, such as that of the IDMS database management system, a set declaration specifies the
name of the set, its owner record type, and its member record type. The insertion mode for the set members
needs to be specified using combinations of the following four commands:
AUTOMATIC
An inserted record is automatically connected to the appropriate set instance.
MANUAL
In this case, records are inserted into the appropriate set instance by an application program.
OPTIONAL
A member record does not have to be a member of a set instance. The member record can be connected to
or disconnected from a set instance using DML commands.
MANDATORY
A member record needs to be connected to a set instance. A member record can be moved to another set
instance using the network’s DML.
FIXED
A member record needs to be connected to a set instance. A member record cannot be moved to another
set instance.
The network’s DDL allows member records to be ordered in several ways. Member records can be sorted in
ascending or descending order according to one or more fields. Alternatively, a new member record can be
inserted next (prior) to the current record (pointed to by the currency indicator) in the set instance. A newly
inserted member record can also be placed first (or last) in the set instance. This will lead to a chronological
(or reverse chronological) order among member records.
As with the hierarchy model, network DMLs are record-at-a-time languages, and currency indicators are
necessary for navigation through the network database. For example, the IDMS main data manipulation
commands can be summarized as follows:
FIGURE 94.9Representing many-to-many relationships in the network model.
? 2000 by CRC Press LLC
CONNECT
Connects a member record to the specified set instance.
DISCONNECT
A member record is disconnected from a set instance (set membership must be manual in this case).
STORE, MODIFY, and DELETE
These commands are used for data storage, modification, and deletion.
FIND
Retrieval command based on set membership.
GET
Retrieval command based on key values.
Architecture of a DBMS
A DBMS is a complicated software structure that includes several components (see Fig. 94.10). The DBMS has
to interact with the operating system for secondary storage access. The data manager is usually the interface
between the DBMS and the operating system. The DDL compiler converts schema definitions, expressed using
DDL statements, into a collection of metadata tables that are stored in the data dictionary. The design of the
schemas is the function of the database administrator (DBA). The DBA is also responsible for specifying the
data storage structure and access methodology and granting and revoking access authorizations. The query
processor converts high-level DML statements into low-level instructions that the database manager can inter-
pret. The DML preprocessor separates embedded DML statements from the rest of an application program. The
resulting DML commands are processed by a DML compiler, and the rest of the application program is compiled
by a host compiler. The object codes of the two components are then linked.
Data Integrity and Security
Data Integrity
In general, during the design of a database schema several integrity constraints are identified. These constraints
may include the uniqueness of a key value, restrictions on the domain of an attribute name, and the ability of
an attribute to have a null value. A DBMS includes mechanisms with which integrity constraints can be specified.
Constraints such as key uniqueness and the admissibility of null values can be specified during schema
definition. Also, more elaborate integrity constraints can be specified. For example, constraints can be imposed
FIGURE 94.10Simplified architecture of a DBMS.
? 2000 by CRC Press LLC
on the domain of an attribute name, and any transaction that violates the imposed constraints is aborted. In
some cases, it is useful to specify that the system take some actions, rather than just have the transaction
responsible for the constraint violation being aborted. A mechanism called trigger can be used for that purpose.
A trigger specifies a condition and an action to be taken when the condition is met.
Transactions and Data Integrity
In a multiuser DBMS, the database is a shared resource that can be accessed concurrently by many users. A transaction
usually refers to the execution of a retrieval or an update program. A transaction performs a single logical operation
in a database application. Therefore, it is an atomic unit of processing. That is, a transaction is either performed in
its entirety or is not performed at all. Basically, a transaction may be in one of the following states (Fig. 94.11):
?active — where read and write operations are performed.
?partially committed — when the transaction ends and various checks are made to ensure that the
transaction did not interfere with other transactions.
?failed — when one of the checks failed or the transaction is aborted during the active state.
?committed — when the execution was successfully completed.
?terminated — when the transaction leaves the system.
Transactions originating from different users may be aimed at the same database records. This situation, if
not carefully monitored, may cause the database to become inconsistent. Starting from a database in a consistent
state, it is obvious that if all transactions are executed one after the other, then the database will remain in a
consistent state. In a multiuser DBMS, serial execution of transactions is wasteful of system resources. In this
case, the solution is to interleave the execution of the transactions. However, the interleaving of transactions
has to be performed in a way that prevents the database from becoming inconsistent. Suppose that two
transactions T
1
and T
2
proceed in the following way:
Time T
1
T
2
read_account(X)
read_account(X) X := X - 20
X := X - 10 write_account(X)
write_account(X)
read_account(Y)
Y := Y + 10
write _account(Y)
The first transaction transfers $10 from bank account X to bank account Y. The second transaction withdraws
$20 from bank account X. Assume that initially there was $200 in X and $100 in Y. When the two transactions
are performed serially, the final amounts in X and Y are $170 and $110, respectively. However, if the two
transactions are interleaved as shown, then after the completion of both transactions, there will be $190 in X
and $110 in Y. The database is now in an inconsistent state.
FIGURE 94.11State transition diagram for transaction execution.
? 2000 by CRC Press LLC
It is therefore important to ensure that the interleaving of the execution of transactions leaves the database
in a consistent state. One way of preserving data consistency is to ensure that the interleaved execution of
transactions is equivalent to their serial execution. This is referred to as serializable execution. Therefore, an
interleaved execution of transactions is said to be serializable if it is equivalent to a serial execution.
Locking is one of the most popular approachs to achieving serializability. Locking is the process of ensuring
that some actions are not performed on a data item. Therefore, a transaction may request a lock on a data item
to prevent it from being either accessed or modified by other transactions. There are two basic types of locks.
A shared lock allows other transactions to read but not write to the data item. An exclusive lock allows only a
single transaction to read and write a data item. To achieve a high degree of concurrency, the locked data item
size must be as small as possible. A data item can range from the whole database to a particular field in a record.
Large data items limit concurrency, while small data items result in a large storage overhead and a greater
number of lock and unlock operations that the system will have to handle.
Transactions scheduling based on locking achieves serializability in two phases. This is known as two-phase
locking. During the first phase, the growing phase, a transaction can only lock new data items, but it cannot
release any locked ones. During the second phase, the shrinking phase, existing locks can be released, but no
new data item can be locked. The two-phase locking scheme guarantees the serializability of a schedule.
Because of its simplicity, the above scheduling method is very practical. However, it may lead to a deadlock. A
deadlock occurs when two transactions are waiting for each other to release locks and both cannot proceed. A
deadlock prevention (or detection) strategy is needed to handle the situation. For example, this can be achieved by
requiring that a transactions locks all data items it needs for its execution before it can proceed; when the transaction
finds that a needed data item is already locked, then it releases all locks.
If a transaction fails for whatever reason after (partially committed) or (active) while updating the database,
it may be necessary to bring the database to its previous (original) state by undoing the transaction. This
operation is called roll-back. A roll-back operation requires some information about the changes made on the
data items during a transaction. Such information is usually kept outside the database in a system log. Generally,
roll-back operations are part of the techniques used to recover from transaction failures.
Database Security
A database needs to be protected against unauthorized access. It is the responsibility of the DBA to create
account numbers and passwords for legitimate users. The DBA can also specify the type of privileges a particular
account has. In relational databases, this includes the privilege to create base relations, create views, alter relations
by adding or dropping a column, and delete relations. The DBA can also revoke privileges that were granted
previously. In SQL, the command GRANT is used to grant privileges and the REVOKE command to revoke
privileges that have been granted.
The concept of views can serve as a convenient security mechanism. Consider a relation EMPLOYEE that
gives the name of an employee, date of birth, the department worked for, address, phone number, and salary.
A database user who is not allowed to have access to the salary of employees from his own department can
have this portion of the database hidden from him. This can be achieved by limiting his access to a view obtained
from the relation EMPLOYEE by selecting only those tuples where the department attribute is different from his.
Database security can be enhanced by using data encryption. The idea here is to encrypt the data using some
coding technique. An unauthorized user will have difficulty deciphering the encrypted data. Only authorized
users are provided with keys to decipher the encoded data.
Emerging Trends
Object-Oriented Databases
Object-oriented database systems (OODBMSs) [Brown, 1991] are one of the latest trends in database technol-
ogy. The emergence of OODBMS is in response to the requirements of advanced applications. In general,
traditional commercial and administrative applications can be effectively modeled using one of the three record-
based data models. These applications are characterized by simple data types. Furthermore, for such applica-
tions, access and relationships are based on data values. Advanced database applications such as those found
in engineering CAD/CAM require complex data structures. When these applications are modeled using the
relational model, they require an excessive number of relations. In addition, a large number of complex
? 2000 by CRC Press LLC
operations are usually needed to produce an answer. This leads, in most cases, to unacceptable performance
levels.
The notion of “object” is central to OODBMS. An object can be seen as being an entity consisting of its own
private memory and external interface (or protocol). The private memory is used to store the state of the object,
and the external interface consists of a set of operations that can be performed on the object. An object
communicates with other objects through messages sent to its external interface. When an object receives a
message, it responds by using its own procedures, known as methods. The methods are responsible for processing
the data in the object’s private memory and sending messages to other objects to perform specific tasks and
possibly send back appropriate results.
The object-oriented approach provides for a high level of abstraction. In addition, this model has constructs
that can be used to define new data types and specialized operators that can be applied to them. This feature
is known as encapsulation.
An object is usually a member of a class. The class specifies the internal structure and the external interface
of an object. New object classes can be defined as a specialization of existing ones. For example, in a university
environment, the object type “faculty” can be seen as a specialization of the object type “employee.” Since a
faculty is a university employee, it has all the properties of a university employee plus some of its own. For
example, some of the general operations that can be performed on an employee could be “raise_salary,”
“fire_employee,” “transfer_employee.” For a faculty, specialized operations such as “faculty_tenure” could be
defined. Faculty can be viewed as a subclass of employee. As a result, faculty (the subclass) will respond to the
same messages as employee (the superclass) in addition to those defined specifically for faculty. This technique
is known as inheritance. A subclass is said to inherit the behavior of its superclass.
Opponents to the object-oriented paradigm point to the fact that while this model has greater modeling
capability, it lacks the simplicity and the strong theoretical foundations of the relational model. Also, the
reappearance of the navigational approach is seen by many as a step backward.
Supporters of the object-oriented approach believe that a navigational approach is a necessity in several
applications. They point to the rich modeling capability of the model, its high level of abstraction, and its
suitability for modular design.
Distributed Databases
A distributed database [Ozsu and Valdurez, 1991] is a collection of interrelated databases spread over the
nodes of a computer network. The management of the distributed database is the responsibility of a software
system usually known as distributed DBMS (DDBMS). One of the tasks of the DDBMS is to make the distributed
nature of the database transparent to the user. A distributed database usually reflects the distributed nature of
some applications. For example, a bank may have branches in different cities. A database used by such an
organization is usually distributed over all these sites. The different sites are connected by a computer network.
A user may access data stored locally or access data stored at other sites through the network.
Distributed databases have several advantages. In distributed databases, the effect of a site failure or data loss
at a particular node can be minimized through data replication. However, data replication reduces security and
makes the process of keeping the database consistent more complicated.
In distributed databases, data is decomposed into fragments that are allocated to the different sites. A fragment
is allocated to a site in a way that maximizes local use. This allocation scheme, which is known as data
localization, reduces the frequency of remote access. In addition, since each site deals with only a portion of
the database, local query processing is expected to exhibit increased performance.
A distributed database is inherently well suited for parallel processing at both interquery and intraquery
levels. Parallel processing at the interquery level is the ability to have multiple queries executed concurrently.
Parallelism at the intraquery level results from the possibility of a single query being simultaneously handled
by many sites, each site acting on a different portion of the database.
The data distribution increases the complexity of DDBMS over a centralized DBMS. In fact, in distributed
databases, several research issues in distributed query processing, distributed database design, and distributed
transaction processing remain to be solved. It is only then that the potential of distributed databases can be
fully appreciated.
? 2000 by CRC Press LLC
Parallel Database Systems
There has been a continuing increase in the amount of data handled by database management systems (DBMSs)
in recent years. Indeed, it is no longer unusual for a DBMS to manage databases ranging in sizes from hundreds
of gigabytes to terabytes. This massive increase in database sizes is coupled with a growing need for DBMSs to
exhibit more sophisticated functionality such as the support of object-oriented, deductive, and multimedia
applications. In many cases, these new requirements have rendered existing DBMSs unable to provide the
necessary system performance, especially given that many mainframe DBMSs already have difficulty meeting
the I/O and CPU performance requirements of traditional information systems that service large numbers of
concurrent users and/or handle massive amounts of data [DeWitt and Gray, 1992].
To achieve the required performance levels, database systems have been increasingly required to make use
of parallelism. Two approaches were suggested to provide parallelism in database systems [Abdelguerfi and
Lavington, 1995]. The first approach uses massively parallel general-purpose hardware platforms. Commercial
systems, such as Intel’s nCube and IBM’s SP2 follow this approach and support Oracle’s Parallel Server. The
second approach makes use of arrays of off-the-shelf components to form custom massively parallel systems.
Usually, these hardware systems are based on MIMD parallel architectures. The NCR 3700 and the Super
Database Computer II (SDC-II) are two such systems. The NCR 3700 now supports parallel version of Sybase
relational DBMS.
The number of general purpose or dedicated parallel database computers is increasing each year. It is not
unrealistic to envisage that most high performance database management systems in the year 2000 will support
parallel processing. The high potential of parallel databases in the future urges both the database vendors and
practitioners to understand the concept of parallel database system in depth.
It is noteworthy that in recent years, popularity of the client/server architecture has increased. This archi-
tecture is practically a derivative of shared-nothing case. In this model, clients’ nodes access data through one
or more servers. This approach derives its strength from an attractive price/performance ratio, a high level of
scalability, and the ease with which additional remote hosts can be integrated into the system. Another driving
force of the client/server approach is the current trend toward corporate downsizing.
Multimedia
Yet another new generation database application is multimedia, where non-text forms of data, such as voice,
video, and image, are accessed via some form of a user interface. Hypermedia interfaces are becoming the
primary delivery system for the multimedia applications. These interfaces, such as Mosaic, allow users to browse
through an information base consisting of many different types of data. The basis of hypermedia is the hypertext,
where some text based information is accessed in a non-sequential manner. Hypermedia is an extension of
hypertext paradigm into multimedia.
Defining Terms
Database: A shared pool of interrelated data.
Database computer: A special hardware and software configuration aimed primarily at handling large data-
bases and answering complex queries.
Database management system (DBMS): A software system that allows for the definition, construction, and
manipulation of a database.
Data model: An integrated set of tools to describe the data and its structure, data relationships, and data
constraints.
Distributed database: A collection of multiple, logically interrelated databases distributed over a computer
network.
Related Topic
87.3 Data Types and Data Structures
? 2000 by CRC Press LLC
References
M. Abdelguerfi and A. K. Sood, Eds., Special Issue on Database Computers, IEEE Micro, December 1991.
M. Abdelguerfi and S. Lavingston, Eds., Emerging Trends in Database and Knowledge Base Machines, IEEE
Computer Science Press, 1995.
A. Brown, Object-Oriented Databases: Applications in Software Engineering, New York: McGraw-Hill, 1991.
E. F. Codd, “A relational model of data for large shared data banks,” Communications of the ACM, pp. 377–387,
June 1970.
D. DeWitt and J. Gray, “Parallel database systems: The future of high performance database systems”, Commu-
nications of the ACM, pp. 85-98, June 1992.
R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, Redwood City, Calif.: Benjamin/ Cummings,
2 ed., 1994.
D. Maier, The Theory of Relational Databases, New York: Computer Science Press, 1983.
M. T. Ozsu and P. Valdurez, Principles of Distributed Database Systems, Englewood Cliffs, N.J.: Prentice-Hall,
1991.
94.2 Rule-Based Expert Systems
Jay Liebowitz
Expert systems is probably the most practical application of artificial intelligence (AI). Artificial intelligence,
as a field, has two major thrusts: (1) to supplement human brain power with intelligent computer power and
(2) to better understand how we think, learn, and reason. Expert systems are one application of AI, and they
are being developed and used throughout the world [Feigenbaum et al., 1988; Liebowitz, 1990]. Other major
applications of AI are robotics, speech understanding, natural-language understanding, computer vision, and
neural networks.
Expert systems are computer programs that emulate the behavior of a human expert in a well-bounded
domain of knowledge [Liebowitz, 1988]. They have been used in a number of tasks, ranging from sheep
reproduction management in Australia, hurricane damage assessment in the Caribbean, boiler plant operation
in Japan, computer configuration in the United States, to strategic management consulting in Europe [Liebowitz,
1991b]. Expert systems technology has been around since the late 1950s, but it has been only since 1980–1981
that the commercialization of expert systems has emerged [Turban, 1992].
An expert system typically has three major components: the dialog structure, inference engine, and knowledge
base [Liebowitz and DeSalvo, 1989]. The dialog structure is the user interface that allows the user to interact
with the expert system. Most expert systems are able to explain their reasoning, in the same manner that one
would want human experts to explain their decisions. The inference engine is the control structure within the
expert system that houses the search strategies to allow the expert system to arrive at various conclusions. The
third component is the knowledge base, which is the set of facts and heuristics (rules of thumb) about the
specific domain task. The knowledge principle says that the power of the expert system lies in its knowledge
base. Expert system shells have been developed and are widely used on various platforms to help one build an
expert system and concentrate on the knowledge base construction. Most operational expert systems are
integrated with existing databases, spreadsheets, optimization modules, or information systems [Mockler and
Dologite, 1992].
The most successful type of expert system is the rule-based, or production, system. This type of expert system
is chiefly composed of IF-THEN (condition-action) rules. For example, the infamous MYCIN expert system,
developed at Stanford University for diagnosing bacterial infections in the blood (meningitis), is rule-based,
consisting of 450–500 rules. XCON, the expert system at Digital Equipment Corporation used for configuring
VAX computer systems, is probably the largest rule-based expert system, consisting of over 11,000 rules. There
are other types of expert systems that represent knowledge in ways other than rules or in conjunction with
rules. Frames, scripts, and semantic networks are popular knowledge representation methods that could be
used in expert systems.
? 2000 by CRC Press LLC
The development of rule-based systems is typically called knowledge engineering. The knowledge engineer
is the individual involved in the development and deployment of the expert system. Knowledge engineering,
in rule-based systems, refers primarily to the construction of the knowledge base. As such, there are six major
steps in this process, namely (1) problem selection, (2) knowledge acquisition, (3) knowledge representation,
(4) knowledge encoding, (5) knowledge testing and evaluation, and (6) implementation and maintenance. The
knowledge engineering process typically uses a rapid prototyping approach (build a little, test a little). Each of
the six steps in the knowledge engineering process will be briefly discussed in turn.
Problem Selection
In selecting an appropriate application for expert systems technology, there are a few guidelines to follow:
?Pick a problem that is causing a large number of people a fair amount of grief.
?Select a “doable,” well-bounded problem (i.e., task takes a few minutes to a few hours to solve)—this is
especially important for the first expert system project for winning management’s support of the tech-
nology.
?Select a task that is performed frequently.
?Choose an application where there is a consensus on the solution of the problem.
?Pick a task that utilizes primarily symbolic knowledge.
?Choose an application where an expert exists and is willing to cooperate in the expert systems develop-
ment.
?Make sure the expert is articulate and available and a backup expert exists.
?Have the financial and moral support from management.
The problem selection and scoping are critical to the success of the expert systems project. As with any
information systems project, the systems analysis stage is an essential and crucial part of the development
process. With expert systems technology, if the problem domain is not carefully selected, then difficulties will
ensue later in the development process.
Knowledge Acquisition
After the problem is carefully selected and scoped, the next step is knowledge acquisition. Knowledge acquisition
involves eliciting knowledge from an expert or multiple experts and also using available documentation,
regulations, manuals, and other written reports to facilitate the knowledge acquisition process. The biggest
bottleneck in expert systems development has, thus far, been in the ability to acquire knowledge. Various
automated knowledge acquisition tools, such as Boeing Computer Services’ AQUINAS, have been developed
to assist in this process, but there are very few knowledge acquisition tools on the market. The most commonly
used approaches for acquiring/eliciting knowledge include: interviewing (structured and unstructured), pro-
tocol analysis, questionnaires (structured and open-ended), observation, learning by example/analogy, and
other various techniques (Delphi technique, statistical methods).
To aid the knowledge acquisition process, some helpful guidelines are:
?Before interviewing the expert, make sure that you (as the knowledge engineer) are familiar/ comfortable
with the domain.
?The first session with the expert should be an introductory lecture on the task at hand.
?The knowledge engineer should have a systematic approach to acquiring knowledge.
?Incorporate the input and feedback from the expert (and users) into the system—get the expert and
users enthusiastic about the project.
?Pick up manuals and documentation on the subject material.
?Tape the knowledge acquisition sessions, if allowed.
? 2000 by CRC Press LLC
Knowledge Representation
After acquiring the knowledge, the next step is to represent the knowledge. In a rule-based expert system, the
IF-THEN (condition-action) rules are used. Rules are typically used to represent knowledge if the preexisting
knowledge can best be naturally represented as rules, if the knowledge is procedural, if the knowledge is mostly
context-independent, and if the knowledge is mostly categorical (“yes-no” type of answers). Frames, scripts,
and semantic networks are used as knowledge representation schemes for more descriptive, declarative knowl-
edge. In selecting an appropriate knowledge representation scheme, try to use the representation method which
most closely resembles the way the expert is thinking and expressing his/her knowledge.
Knowledge Encoding
Once the knowledge is represented, the next step is to encode the knowledge. Many knowledge engineers use
expert system shells to help develop the expert system prototypes. Other developers may build the expert system
from scratch, using such languages as Lisp, Prolog, C, and others. The following general guidelines may be
useful in encoding the knowledge:
? Remember that for every shell there is a perfect task, but for every task there is NOT a perfect shell.
? Consider using an expert system shell for prototyping/proof-of-concept purposes—remember to first
determine the requirements of the task, instead of force-fitting a shell to a task.
? Try to develop the knowledge base in a modular format for ease of updating.
? Concentrate on the user interface and human factors features, as well as the knowledge base.
? Use an incremental, iterative approach.
? Consider whether uncertainty should play a part in the expert system.
? Consider if the expert reasons in a data-driven manner (forward chaining) or a goal-directed manner
(backward chaining), or both.
Knowledge Testing and Evaluation
Once the knowledge is encoded in the system, testing and evaluation need to be conducted. Verification and
validation refers to checking for the consistency of the knowledge/logic and checking the quality/accuracy of
advice reached by the expert system. Various approaches to testing can be used, such as: performing “backcast-
ing” by running the expert system (using a representative set of test cases) against documented cases and
comparing the expert system-generated results with the historical results, using blind verification tests (modified
Turing test), having the expert and other experts test the system, using statistical methods for testing, and
others. In evaluating the expert system, the users should evaluate the design of the human factors in the system
(i.e., instructions, free-text comments, ease of updating, exiting capabilities, response time, display and pre-
sentation of conclusions, ability to restart, ability for user to offer degree of certainty, graphics, utility of the
system, etc.).
Implementation and Maintenance
Once the system is ready to be deployed within the organization, the knowledge engineer must be cognizant
of various institutionalization factors [Liebowitz, 1991a; Turban and Liebowitz, 1992]. Institutionalization refers
to implementing and transitioning the expert system into the organization. Frequently, the technology is not
the limiting factor—the management of the technology is often the culprit. An expert system may be accurate
and a technical success, but without careful attention to management and institutionalization considerations,
the expert system may be a technology transfer failure. There are several useful guidelines for proper institu-
tionalization of expert systems:
? Know the corporate culture in which the expert system is deployed.
? Planning for the institutionalization process must be thought out well in advance, as early as the
requirements analysis stage.
? 2000 by CRC Press LLC
? Through user training, help desks, good documentation, hotlines, etc., the manager can provide mech-
anisms to reduce “resistance to change.”
? Solicit and incorporate users’ comments during the analysis, design, development, and implementation
stages of the expert system.
? Make sure there is a team/individual empowered to maintain the expert system.
? Be cognizant of possible legal problems resulting from the use and misuse of the expert system.
? During the planning stages, determine how the expert system will be distributed.
? Keep the company’s awareness of expert systems at a high level throughout the system’s development
and implementation, and even after its institutionalization.
Defining Terms
Expert systems: A computer program that emulates a human expert in a well-bounded domain of knowledge.
Knowledge base: The set of facts and rules of thumb (heuristics) on the domain task.
Knowledge engineering: The process of developing an expert system.
References
E.A. Feigenbaum, P. McCorduck, and P. Nii, The Rise of the Expert Company, New York: Times Books, 1988.
J.K. Lee, J. Liebowitz, and Y.M. Chae, Eds., Proceedings of the Third World Congress on Expert Systems, New
York: Cognizant Communication Corp., 1996.
J. Liebowitz, Introduction to Expert Systems, New York: Mitchell/McGraw-Hill Publishing, 1988.
J. Liebowitz, Ed., Expert Systems for Business and Management, Englewood Cliffs, N.J.: Prentice-Hall, 1990.
J. Liebowitz, Institutionalizing Expert Systems: A Handbook for Managers, Englewood Cliffs, N.J.: Prentice-Hall,
1991a.
J. Liebowitz, Ed., Operational Expert System Applications in the United States, New York: Pergamon Press, 1991b.
J. Liebowitz, and D. DeSalvo, Eds., Structuring Expert Systems: Domain, Design, and Development, Englewood
Cliffs, N.J.: Prentice-Hall, 1989.
R. Mockler and D. Dologite, An Introduction to Expert Systems, New York: Macmillan Publishing, 1992.
E. Turban, Expert Systems and Applied Artificial Intelligence, New York: Macmillan Publishing, 1992.
E. Turban and J. Liebowitz, Eds., Managing Expert Systems, Harrisburg, Pa.: Idea Group Publishing, 1992.
Further Information
There are several journals and magazines specializing in expert systems that should be consulted:
Expert Systems with Applications: An International Journal, New York/Oxford: Pergamon Press, Elsevier.
Expert Systems, Medford, N.J.: Learned Information, Inc.
IEEE Expert, Los Alamitos, Calif.: IEEE Computer Society Press.
AI Expert, San Francisco: Miller Freeman Publications.
Intelligent Systems Report, Atlanta: AI Week, Inc.
? 2000 by CRC Press LLC