Abdelguerfi, M., Eskicioglu, R., Liebowitz, J. “Knowledge Engineering” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000 94 Knowledge Engineering 94.1 Databases Database Abstraction?Data Models?Relational Databases?Hierarchical Databases?Network Databases?Architecture of a DBMS?Data Integrity and Security?Emerging Trends 94.2 Rule-Based Expert Systems Problem Selection?Knowledge Acquisition?Knowledge Representation?Knowledge Encoding?Knowledge Testing and Evaluation?Implementation and Maintenance 94.1 Databases M. Abdelguerfi and R. Eskicioglu In the past, file processing techniques were used to design information systems. These systems usually consist of a set of files and a collection of application programs. Permanent records are stored in the files, and application programs are used to update and query the files. The application programs were in general developed individ- ually to meet the needs of different groups of users. In many cases, this approach leads to a duplication of data among the files of different users. Also, the lack of coordination between files belonging to different users often leads to a lack of data consistency. In addition, changes to the underlying data requirements usually necessitate major changes to existing application programs. Among other major problems that arise with the use of file processing techniques are lack of data sharing, reduced programming productivity, and increased program maintenance. Because of their inherent difficulties and lack of flexibility, file processing techniques have lost a great deal of their popularity and are being replaced by database management systems (DBMS). A DBMS is designed to efficiently manage a shared pool of interrelated data (database). This includes the existence of features such as a data definition language for the definition of the logical structure of the database (database schema), a data manipulation language to query and update the database, a concurrency control mechanism to keep the database consistent when shared by several users, a crash recovery strategy to avoid any loss of information after a system crash, and safety mechanisms against any unauthorized access. Database Abstraction A DBMS is expected to provide for data independence, i.e., user requests are made at a logical level without any need for the knowledge of how the data is stored in actual files. This implies that the internal file structure could be modified without any change to the user’s perception of the database. To achieve data independence, the Standards Planning and Requirements Committee (SPARC) of the American National Standards Institute (ANSI) in its 1977 report recommended three levels of database abstraction (see Fig. 94.1). The lowest level in the abstraction is the internal level. Here, the database is viewed as a collection of files organized according to one of several possible internal data organizations (e.g., B + -tree data organization). In the conceptual level, the database is viewed at an abstract level. The user at this level is shielded from the internal storage details. At the external level, each group of users has their own perception or view of the database. Each view is derived from M. Abdelguerfi and R. Eskicioglu University of New Orleans Jay Liebowitz George Washington University ? 2000 by CRC Press LLC the conceptual database and is designed to meet the needs of a particular group of users. Such a group can only have access to the data specified by its particular view. This, of course, ensures both privacy and security. The mapping between the three levels of abstraction is the task of the DBMS. When changes to the internal level (such as a change in file organi- zation) do not affect the conceptual and external levels, the system is said to provide for physical data independence. Logical data independence prevents changes to the conceptual level to affect users’ views. Both types of data independence are desired features in a database system. Data Models A data model refers to an integrated set of tools used to describe the data and its structure, data relationships, and data constraints. Some data models provide a set of operators that is used to update and query the database. Data models can be classified in two main categories: record-based and object- based. Both classes are used to describe the database at the conceptual and external levels. With object-based data models, constraints on the data can be specified more explicitly. There are three main record-based data models: the relational, network, and hierarchical models. In the relational model, data at the conceptual level is represented as a collection of interrelated tables. These tables are normalized so as to minimize data redundancy and update anomalies. In this model, data relationships are implicit and are derived by matching columns in tables. In the hierarchical and network models, the data is represented as a collection of records and data relationships are explicit and are represented by links. The difference between the last two models is that in the hierarchical model, data is represented as a tree structure, while it is represented as a generalized graph in the network model. In hierarchical and network models, the existence of physical pointers (links) to link related records allows an application program to retrieve a single record at a time by following the pointer’s chain. The process of following the pointer’s chain and selecting one record at a time is referred to as navigation. In nonnavigational models such as the relational model, records are not related through pointer’s chains, but relationships are established by matching columns in different tables. The hierarchical and network models require the application programmer to be aware of the internal structure of the database. The relational model, on the other hand, allows for a high degree of physical and logical data independence. Earlier DBMSs were for the most part navigational systems. Because of its simplicity and strong theoretical foundations, the relational model has since received wide acceptance. Today, most DBMSs are based on the relational model. Other data models include a popular high level conceptual data model, known as the Entity-Relationship (ER) model. The ER model is mainly used for the conceptual design of databases and their applications. The ER model describes data as entities, attributes, and relationships. An entity is an “object” in the real world with an independent existence. Each entity has a set of properties, called attributes, that describes it. A relationship is an association between entities. For example, a professor entity may be described by its name, age, and salary and can be associated with a department entity by the relationship “works for”. With the advent of advanced database applications, the ER modeling concepts became insufficient. This has led to the enhancement of the ER model with additional concepts, such as generalization, categories, and inheritance, leading to the Enhanced-ER or EER model. Relational Databases The relational model was introduced by E. F. Codd [1970]. Since the theoretical underpinnings of the relational model have been well defined, it has become the focus of most commercial DBMSs. In the relational model, the data is represented as a collection of relations. To a large extent, each relation can be thought of as a table. The example of Fig. 94.2 shows part of a university database composed of two FIGURE 94.1Data abstraction. ? 2000 by CRC Press LLC relations. FAC_INFO gives personal information (last name, social security, street and city of residence, and department) of a faculty. DEP_CHAIR gives the last name of the chairman of each department. A faculty is not allowed to belong to two departments. Each row in a relation is referred to as a tuple. A column name is called an attribute name. The data type of each attribute name is known as its domain. A relation scheme is a set of attribute names. For instance, the relation scheme (or scheme for short) of the relation FAC_INFO is (lname, social_sec#, street, city, dept). A key is a set of attribute names whose composite value is distinct for all tuples. In addition, no proper subset of the key is allowed to have this property. It is not unusual for a scheme to have several possible keys. In FAC_INFO, both lname and social_sec# are possible keys. In this case, each possible key is known as a candidate key, and the one selected to act as the relation’s key, say, lname, is referred to as the primary key. A superkey is a key with the exception that there is no requirement for minimality. In a relation, an attribute name (or a set of attribute names) is referred to as a foreign key, if it is the primary key of another relation. In FAC_INFO, the attribute name dept is a foreign key, since the same attribute is a key in DEP_CHAIR. Because of updates to the database, the content of a relation is dynamic. For this reason, the data in a relation at a given time instant is called an instance of the relation. There are three integrity constraints that are usually imposed on each instance of a relation: primary key integrity, entity integrity, and referential integrity. The key integrity constraint requires that no two tuples of a relation have the same key value. The entity integrity constraint specifies that the key value of each tuple should have a known value (i.e., no null values are allowed for primary keys). The referential integrity constraint specifies that if a relation r 1 contains a foreign key that matches the primary key of a relation r 2 , then each value of the foreign key in r 1 must either match a value of the primary key in r 2 or must be null. For the database of Fig. 94.2 to be consistent, each value of dept in FAC_INFO must match a value of dept in DEP_CHAIR. Relational Database Design The relational database design [Maier, 1983] refers to the process of generating a set of relation schemes that minimizes data redundancy and removes update anomalies. One of the most popular approaches is the use of the normalization theory. The normalization theory is based on the notion of functional dependencies. Functional dependencies are constraints imposed on a database. The notion of superkey, introduced in the previous section, can be formulated as follows: A subset of a relation scheme is a superkey if, in any instance of the relation, no two distinct tuples have the same superkey value. If r(R) is used to denote a relation r on a schema R, K í R a superkey, and t(k) the K-value of tuple t, then no two tuples t 1 and t 2 in r(R) are such that t 1 (K) = t 2 (K). The notion of a functional dependency can be seen as a generalization of the notion of superkey. Let X and Y be two subsets of R; the functional dependency X ? Y exists in r(R) if whenever two tuples in r(R) have the same X-value, their Y-value is also the same. That is, if t 1 (X) = t 2 (X), then t 1 (Y) = t 2 (Y). Using functional dependencies, one can define the notion of a key more precisely. A key k of a relation r (R) is such that k ? R and no proper subset of k has this property. Note that if the schema R is composed of attribute names {A 1 , A 2 , . . ., A n }, then each attribute name A i is functionally determined by the key k, i.e., k ? A i , i = 1, . . ., n. An FIGURE 94.2 An example of two relations: FAC_INFO and DEP_CHAIR. ? 2000 by CRC Press LLC attribute name that is part of a key is referred to as a prime attribute. In the example of Fig. 94.2, both attribute names street and city are nonprime attributes. The normalization process can be thought of as the process of decomposing a scheme with update anomalies and data redundancy into smaller schemes in which these undesirable properties are to a large extent eliminated. Depending on the severity of these undesirable properties, schemes are classified into normal forms. Originally, Codd defined three normal forms: first normal form (1NF), second normal form (2NF), and third normal form (3NF). Thereafter, a stronger version of the 3NF, known as Boyce-Codd normal form (BCNF), was suggested. These four normal forms are based on the concept of functional dependencies. The 1NF requires that attribute name values be atomic. That is, composite values for attribute names are not allowed. A 2NF scheme is a 1NF scheme in which all nonprime attributes are fully dependent on the key. Consider the relation of Fig. 94.3. Each tuple in PRODUCT gives the name of a supplier, a product name, its price, and the supplier’s location. The scheme (supplier_name, product_name, price, quantity) is in 1NF since each attribute name is atomic. It is assumed that many products can be supplied by a single supplier, that a given product can be supplied by more than one supplier, and that a supplier has only one location. So, (supplier_name, product_name) is the relation’s key and the functional dependency supplier_name ? location should hold for any instance of PRODUCT. The structure of the relation of Fig. 94.3 does not allow a supplier to appear in the relation unless it offers at least one product. Even the use of null values is not of much help in this case as product_name is part of a key and therefore cannot be assigned a null value. Another anomaly can be encountered during the deletion process. For instance, deleting the last tuple in the relation results in the loss of the information that Rudd is a supplier located in Metairie. It is seen that the relation PRODUCT suffers from insertion and deletion anomalies. Modifications can also be a problem in the relation PRODUCT. Suppose that the location of the supplier Martin is moved from Kenner to Slidell. In order not to violate the functional dependency supplier_name ? location, the location attribute name of all tuples where the supplier is Martin needs to be changed from Kenner to Slidell. This modification anomaly has a negative effect on performance. In addition, the relation PRODUCT suffers from data redundancy. For example, although Martin has only one location “Kenner”, such a location appears in all three tuples where the supplier_name is Martin. The update anomalies and data redundancy encountered in PRODUCT are all due to the functional depen- dency supplier_name ? location. The right-hand side of this dependency “location” is a nonprime attribute, and the left-hand side represents part of the key. Therefore, we have a nonprime attribute that is only partially dependent on the key (supplier_name, product_name). As a consequence, the schema (supplier_name, product_name, price, location) is not in 2NF. The removal of the partial dependency supplier_name ? location will eliminate all the above anomalies. The removal of the partial dependency is achieved by decomposing the scheme (supplier_name, product_name, price, quantity) into two 2NF schemes: (supplier_name, product_name, price), and (supplier_name, location). This decomposition results in relations PRO_INFO and SUP_LOC shown in Fig. 94.4. The keys of PRO_INFO and SUP_LOC are (supplier_name, product_name), and supplier_name, respectively. Normalizing schemes into 2NF removes all update anomalies due to nonprime attributes being partially dependent on keys. Anomalies of a different nature, however, are still possible. Update anomalies and data redundancy can originate from transitive dependencies. A nonprime attribute A i is said to be transitively dependent on a key k via attribute name A j , if k ? A j , A j ? A i , and A j does not functionally determine A k . A 3NF is a 1NF where no nonprime attribute is transitively dependent on a key. FIGURE 94.3Instance of PRODUCT (supplier_name, product_name, price, quantity). ? 2000 by CRC Press LLC The relation of Fig. 94.5, which is in 2NF, highlights update anomalies and data redundancy due to the transitive dependency of a nonprime attribute on a key. The relation gives the name of a client (client_name), the corresponding supplier (supplier_name), and the supplier’s location. Each client is assumed to have one supplier. The relation’s key is client_name, and each supplier has only one location. A supplier and his location cannot be inserted in SUPPLIES unless the supplier has at least one client. In addition, the relation has a deletion anomaly since if Tillis is no longer a client of Rudd, the information about Rudd as a supplier and his location is lost. A change to a supplier’s location may require updating the location attribute name of several tuples in the relation. Also, although each supplier has only one location, such a location is sometimes repeated several time unnecessarily, leading to data redundancy. The relation exhibits the following transitive dependency: client_name ? supplier_name, supplier_name ? location (but not the inverse). The relation CLIENT is clearly in 2NF, but because of the transitive dependency of the nonprime attribute location on the key, it is not in 3NF. This is the cause of the anomalies mentioned above. Eliminating this transitive dependency by splitting the schema into two components will remove these anomalies. Clearly, the resulting two relations SUP_CLI and SUP_LOC are in 3NF (see Fig. 94.6). Each partial dependency of a nonprime attribute on a key can be expressed as a transitive dependency of a nonprime attribute on a key. Therefore, a scheme in 3NF is also in 2NF. BCNF is a stricter form of 3NF, where a relation r on a schema R is in BCNF if whenever a functional dependency X ? Y exists in r(R), then X is a superkey of R. The condition of 3NF, which allows Y to be prime if X is not a superkey, does not exist in BCNF. Thus, every scheme in BCNF is also in 3NF, but the opposite is not always true. FIGURE 94.4Decomposition of PRODUCT into PRO_INFO and SUP_LOC. FIGURE 94.5Instance of SUPPLIES. FIGURE 94.6Decomposition of SUPPLIES into SUP_CLI and SUP_LOC. ? 2000 by CRC Press LLC A detailed discussion of higher level normalizations, such as 4NF and 5NF, which are based on other forms of dependencies, can be found in [Elmasri and Navathe, 1994]. Data Definition and Manipulation in Relational Databases Upon completion of the relational database design, a descriptive language, usually referred to as Data Definition Language (DDL), is used to define the designed schemes and their relationships. The DDL can be used to create new schemes or modify existing ones, but it cannot be used to query the database. Once DDL statements are compiled, they are stored in the data dictionary. A data dictionary is a repository where information about database schemas, such as attribute names, indexes, and integrity constraints are stored. Data dictionaries also contain other information about databases, such as design decisions, usage standards, application program descriptions, and user information. During the processing of a query, the DBMS usually checks the data dictionary. The data dictionary can be seen as a relational database of its own. As a result, data manipulation languages that are used to manipulate databases can also be used to query the data dictionary. An important function of a DBMS is to provide a Data Manipulation Language (DML) with which a user can retrieve, change, insert, and delete data from the database. DMLs are classified into two types: procedural and nonprocedural. The main difference between the two types is that in procedural DMLs, a user has to specify the desired data and how to obtain it, while in nonprocedural DMLs, a user has only to describe the desired data. Because they impose less burden on the user, nonprocedural DMLs are normally easier to learn and use. The component of a DML that deals with data retrieval is referred to as query language. A query language can be used interactively in a stand-alone manner, or it can be embedded in a general-purpose programming language such as C and Cobol. One of the most popular query languages is SQL (Structured Query Language). SQL is a query language based to a large extent on Codd’s relational algebra. SQL has additional features for data definition and update. Therefore, SQL is a comprehensive relational database language that includes both a DDL and DML. SQL includes the following commands for data definition: CREATE TABLE, DROP TABLE, and ALTER TABLE. The CREATE TABLE is used to create and describe a new relation. The two relations of Fig. 94.4 can be created in the following manner: CREATE TABLE PRO_INFO (supplier_name VARCHAR(12) NOT NULL, product_name VARCHAR(8) NOT NULL, price DECIMAL(6,2)); CREATE TABLE SUP_LOC (supplier_name VARCHAR(12) NOT NULL, location VARCHAR(10)); The CREATE TABLE command specifies all the attribute names of a relation and their data types (e.g., INTEGER, DECIMAL, fixed length character “CHAR”, variable length character “VARCHAR”, DATE). The constraint NOT NULL is usually specified for those attributes that cannot have null values. The primary key of each relation in the database is usually required to have a nonnull value. If a relation is created incorrectly, it can be deleted using the DROP TABLE command. The command is DROP TABLE followed by the name of the relation to be deleted. A variation of DROP command, DROP SCHEMA, is used if the whole schema is no longer needed. The ALTER TABLE is used to add new attribute names to an existing relation, as follows: ALTER TABLE SUP_LOC ADD zip_code CHAR(5); The SUP_LOC relation now contains an extra attribute name, zip_code. In most DBMSs, the zip_code value of existing tuples will automatically be assigned a null value. Other DBMSs allow for the assignment of an initial value to a newly added attribute name. Also, definitions of attributes can be changed and new constraints can be added, or current constraints can be dropped. The DML component of SQL has one basic query statement, sometimes called a mapping, that has the following structure: SELECT <attribute_name list> FROM <relation_list> WHERE <restriction> ? 2000 by CRC Press LLC In the above statement, the SELECT clause specifies the attribute names that are to be retrieved, FROM gives the list of the relations involved, and WHERE is a Boolean predicate that completely specifies the tuples to be retrieved. Consider the database of Fig. 94.4, and suppose that we want the name of all suppliers that supply either beds or desks. In SQL, this query can be expressed as: SELECT supplier_name FROM PRO_INFO WHERE product_name = òbedó OR product_name = òsofaó The result of an SQL command may contain duplicate values and is therefore not always a true relation. In fact, the result of the above query, shown below, has duplicate entries. supplier_name Martin Martin Rudd The entry Martin appears twice in the result, because the supplier Martin supplies both beds and sofas. Removal of duplicates is usually a computationally intensive operation. As a result, duplicate entries are not automatically removed by SQL. To ensure uniqueness, the command DISTINCT should be used. In the above query, if we want the supplier names to be listed only once, the above query should be modified as follows: SELECT DISTINCT supplier_name FROM PRO_INFO WHERE product_name = òbedó OR product_name = òsofaó In SQL, a query can involve more than one relation. Suppose that we want the list of all suppliers from Metairie who supply beds. Such a query, shown below, involves both PRO_INFO and SUP_LOC. SELECT supplier_name FROM PRO_INFO, SUP_LOC WHERE PRO_INFO.supplier_name = SUP_LOC.supplier_name AND product_name = òbedó When an SQL expression, such as the one above, involves more than one relation, it is sometimes necessary to qualify attribute names, that is, to precede an attribute name by the relation (a period is placed between the two) it belongs to. Such a qualification removes possible ambiguities. In SQL, it is possible to have several levels of query nesting; this is done by including a SELECT query statement within the WHERE clause. The output data can be presented in sorted order by using the SQL ORDER BY clause followed by the attribute name(s) according to which the output is to be sorted. In database management applications it is often desirable to categorize the tuples of a relation by the values of a set of attributes and extract an aggregated characteristic of each category. Such database management tasks are referred to as aggregation functions. For instance, SQL includes the following built-in aggregation functions: SUM, COUNT, AVERAGE, MIN, MAX. The attribute names used for the categorization are referred to as FIGURE 94.7 Instance of the relation PROFESSOR. ? 2000 by CRC Press LLC GROUP BY columns. Consider the relation PROFESSOR of Fig. 94.7. Each tuple of the above relation gives the name of a faculty and his department and academic year salary. Suppose that we want to know the number of faculty in each department and the result to be ordered by department. This query requests for each department a count of the number of faculty. Faculty are therefore categorized according to the attribute name department. As a result, department is referred to as a GROUP BY attribute. In SQL, the above query is formulated as follows: SELECT department, COUNT (faculty) FROM PROFESSOR GROUP BY department ORDER BY department The result of applying the COUNT aggregation function is a new relation with two attribute names. They are a GROUP BY attribute (department in this case) and a new attribute called COUNT. The tuples are ordered lexicographically in ascending order according to the ORDER BY attribute, which is department in this case: department COUNT (faculty) Computer Sc. 4 Electrical Eng. 3 Mechanical Eng. 2 The relations created through the CREATE TABLE command are known as base relations. A base relation exists physically and is stored as a file by the DBMS. SQL can be used to create views using the CREATE VIEW command. In contrast to base relations, the creation of a view results in a virtual relation, that is, one that does not necessarily correspond to a physical file. Consider the database of Fig. 94.4, and suppose that we want to create a view giving the name of all suppliers located in Metairie, the products each one provides, and the corresponding prices. Such a view, called METAIRIE_SUPPLIER, can be created as follows: CREATE VIEW METAIRIE_SUPPLIER AS SELECT PRO_INFO.supplier_name, product_name, price FROM PRO_INFO, SUP_LOC WHERE PRO_INFO.supplier_name = SUP_LOC.supplier_name AND location = òMetairieó Because a view is a virtual relation that can be constructed from one or more relations, updating a view may lead to ambiguities. As a result, when a view is generated from more than one relation, there are, in general, restrictions on updating such a view. Hierarchical Databases The hierarchical data model [Elmasri and Navathe, 1994] uses a tree data structure to conceptualize associations between different record types. In this model, record types are represented as nodes and associations as links. Each record type, except the root, has only one parent; that is, only parent-child (or one-to-many) relationships are allowed. This restriction gives hierarchical databases their simplicity. Since links are only one way, from a parent to a child, the design of hierarchical database management systems is made simpler, and only a small set of data manipulation commands are needed. Because only parent-child relationships are allowed, the hierarchical model cannot efficiently represent two main types of relationships: many-to-many relationships and the case where a record type is a child in more than one hierarchical schema. These two restrictions can be handled by allowing redundant record instances. However, such a duplication requires that all the copies of the same record should be kept consistent at all times. The example of Fig. 94.8 shows a hierarchical schema. The schema gives the relationship between a DEPART- MENT, its employees (D_EMPLOYEE), the projects (D_PROJECT) handled by the different departments, and how employees are assigned to these projects. It is assumed that an employee belongs to only one department, a project is handled by only one department, and an employee can be assigned to several projects. Notice that since a project has several employees assigned to it, and an employee can be assigned to more than one project, the relationship between D_PROJECT and D_EMPLOYEE is many-to-many. To model this relationship mul- tiple instances of the same record type D-EMPLOYEE may appear under different projects. ? 2000 by CRC Press LLC Such redundancies can be reduced to a large extent through the use of logical links. A logical link associates a virtual record from a hierarchical schema with an actual record from either the same schema or another schema. The redundant copy of the actual record is therefore replaced by a virtual record, which is nothing more than a pointer to the actual one. Hierarchical DLLs are used by a designer to declare the different hierarchical schemas, record types, and logical links. Furthermore, a root node must be declared for each hierarchical schema, and each record type declaration must also specify the parent record type. Unlike relational DMLs, hierarchical DMLs such as DL/1 are record at-a-time languages. DL/1 is used by IBM’s IMS hierarchical DBMS. In DL/1 a tree traversal is based on a preorder algorithm, and within each tree, the last record accessed through a DL/1 command can be located through a currency indicator. Retrieval commands are of three types: GET UNIQUE <record type> WHERE <restrictions> Such a command retrieves the leftmost record that meets the imposed restrictions. The search always starts at the root of the tree pointed to by the currency indicator. GET NEXT [<record type> WHERE <restrictions>] Starting from the current position, this command uses the preodrer algorithm to retrieve the next record that satisfies the restrictions. The clause enclosed between brackets is optional. GET NEXT is used to retrieve the next (preorder) record from the current position. GET NEXT WITHIN PARENT [<record type> WHERE <restrictions>] It retrieves all records that have the same parent and that satisfy the restrictions. The parent is assumed to have been selected through a previous GET command. Four commands are used for record updates: INSERT Stores a new record and links it to a parent. The parent has been already selected through a GET command. REPLACE The current record (selected through a previous GET command) is modified. DELETE The current record and all its descendants are deleted. GET HOLD Locks the current record while it is being modified. The DL/1 commands are usually embedded in a general-purpose (host) language. In this case, a record accessed through a DL/1 command is assigned to a program variable. Network Databases In the network model [Elmasri and Navathe, 1994] associations between record types are less restrictive than with the hierarchy model. Here, associations among record types are represented as graphs. One-to-one and one-to-many relationships are described using the notion of set type. Each set type has an owner record type and a member record type. In the example of Fig. 94.8, the relationship between DEPARTMENT and FIGURE 94.8 A hierarchical schema. ? 2000 by CRC Press LLC employee (D_EMPLOYEE) is one-to-many. This relationship defines a set type where the owner record type is DEPARTMENT and the member record type is D_EMPLOYEE. Each instance of an owner record type along with all the corresponding member records represents a set instance of the underlying set type. In practice, a set is commonly implemented using a circular-linked list which allows an owner record to be linked to all its member records. The pointer associated with the owner record is known as the FIRST pointer, and the one associated with a member record is known as a NEXT pointer. In general, a record type cannot be both the owner and a member of the same set type. Also, a record cannot exist in more than one instance of a specific set type. The latter requirement implies that many-to-many relationships are not directly implemented in the network data model. The relationship between D_PROJECT and D-EMPLOYEE is many-to-many. In the network model, this relationship is represented by two set types and an intermediate record type. The new record type could be named ASSIGNED_TO (see Fig. 94.9). One set has D_EMPLOYEE as owner and ASSIGNED_TO as member record type, and the other has D_PROJECT as owner and ASSIGNED_TO as member record type. Standards for the network model’s DDL and DML were originally proposed by the CODASYL (Conference On Data SYstems Languages) committee in 1971. Several revisions to the original proposal were made later. In a network DDL, such as that of the IDMS database management system, a set declaration specifies the name of the set, its owner record type, and its member record type. The insertion mode for the set members needs to be specified using combinations of the following four commands: AUTOMATIC An inserted record is automatically connected to the appropriate set instance. MANUAL In this case, records are inserted into the appropriate set instance by an application program. OPTIONAL A member record does not have to be a member of a set instance. The member record can be connected to or disconnected from a set instance using DML commands. MANDATORY A member record needs to be connected to a set instance. A member record can be moved to another set instance using the network’s DML. FIXED A member record needs to be connected to a set instance. A member record cannot be moved to another set instance. The network’s DDL allows member records to be ordered in several ways. Member records can be sorted in ascending or descending order according to one or more fields. Alternatively, a new member record can be inserted next (prior) to the current record (pointed to by the currency indicator) in the set instance. A newly inserted member record can also be placed first (or last) in the set instance. This will lead to a chronological (or reverse chronological) order among member records. As with the hierarchy model, network DMLs are record-at-a-time languages, and currency indicators are necessary for navigation through the network database. For example, the IDMS main data manipulation commands can be summarized as follows: FIGURE 94.9Representing many-to-many relationships in the network model. ? 2000 by CRC Press LLC CONNECT Connects a member record to the specified set instance. DISCONNECT A member record is disconnected from a set instance (set membership must be manual in this case). STORE, MODIFY, and DELETE These commands are used for data storage, modification, and deletion. FIND Retrieval command based on set membership. GET Retrieval command based on key values. Architecture of a DBMS A DBMS is a complicated software structure that includes several components (see Fig. 94.10). The DBMS has to interact with the operating system for secondary storage access. The data manager is usually the interface between the DBMS and the operating system. The DDL compiler converts schema definitions, expressed using DDL statements, into a collection of metadata tables that are stored in the data dictionary. The design of the schemas is the function of the database administrator (DBA). The DBA is also responsible for specifying the data storage structure and access methodology and granting and revoking access authorizations. The query processor converts high-level DML statements into low-level instructions that the database manager can inter- pret. The DML preprocessor separates embedded DML statements from the rest of an application program. The resulting DML commands are processed by a DML compiler, and the rest of the application program is compiled by a host compiler. The object codes of the two components are then linked. Data Integrity and Security Data Integrity In general, during the design of a database schema several integrity constraints are identified. These constraints may include the uniqueness of a key value, restrictions on the domain of an attribute name, and the ability of an attribute to have a null value. A DBMS includes mechanisms with which integrity constraints can be specified. Constraints such as key uniqueness and the admissibility of null values can be specified during schema definition. Also, more elaborate integrity constraints can be specified. For example, constraints can be imposed FIGURE 94.10Simplified architecture of a DBMS. ? 2000 by CRC Press LLC on the domain of an attribute name, and any transaction that violates the imposed constraints is aborted. In some cases, it is useful to specify that the system take some actions, rather than just have the transaction responsible for the constraint violation being aborted. A mechanism called trigger can be used for that purpose. A trigger specifies a condition and an action to be taken when the condition is met. Transactions and Data Integrity In a multiuser DBMS, the database is a shared resource that can be accessed concurrently by many users. A transaction usually refers to the execution of a retrieval or an update program. A transaction performs a single logical operation in a database application. Therefore, it is an atomic unit of processing. That is, a transaction is either performed in its entirety or is not performed at all. Basically, a transaction may be in one of the following states (Fig. 94.11): ?active — where read and write operations are performed. ?partially committed — when the transaction ends and various checks are made to ensure that the transaction did not interfere with other transactions. ?failed — when one of the checks failed or the transaction is aborted during the active state. ?committed — when the execution was successfully completed. ?terminated — when the transaction leaves the system. Transactions originating from different users may be aimed at the same database records. This situation, if not carefully monitored, may cause the database to become inconsistent. Starting from a database in a consistent state, it is obvious that if all transactions are executed one after the other, then the database will remain in a consistent state. In a multiuser DBMS, serial execution of transactions is wasteful of system resources. In this case, the solution is to interleave the execution of the transactions. However, the interleaving of transactions has to be performed in a way that prevents the database from becoming inconsistent. Suppose that two transactions T 1 and T 2 proceed in the following way: Time T 1 T 2 read_account(X) read_account(X) X := X - 20 X := X - 10 write_account(X) write_account(X) read_account(Y) Y := Y + 10 write _account(Y) The first transaction transfers $10 from bank account X to bank account Y. The second transaction withdraws $20 from bank account X. Assume that initially there was $200 in X and $100 in Y. When the two transactions are performed serially, the final amounts in X and Y are $170 and $110, respectively. However, if the two transactions are interleaved as shown, then after the completion of both transactions, there will be $190 in X and $110 in Y. The database is now in an inconsistent state. FIGURE 94.11State transition diagram for transaction execution. ? 2000 by CRC Press LLC It is therefore important to ensure that the interleaving of the execution of transactions leaves the database in a consistent state. One way of preserving data consistency is to ensure that the interleaved execution of transactions is equivalent to their serial execution. This is referred to as serializable execution. Therefore, an interleaved execution of transactions is said to be serializable if it is equivalent to a serial execution. Locking is one of the most popular approachs to achieving serializability. Locking is the process of ensuring that some actions are not performed on a data item. Therefore, a transaction may request a lock on a data item to prevent it from being either accessed or modified by other transactions. There are two basic types of locks. A shared lock allows other transactions to read but not write to the data item. An exclusive lock allows only a single transaction to read and write a data item. To achieve a high degree of concurrency, the locked data item size must be as small as possible. A data item can range from the whole database to a particular field in a record. Large data items limit concurrency, while small data items result in a large storage overhead and a greater number of lock and unlock operations that the system will have to handle. Transactions scheduling based on locking achieves serializability in two phases. This is known as two-phase locking. During the first phase, the growing phase, a transaction can only lock new data items, but it cannot release any locked ones. During the second phase, the shrinking phase, existing locks can be released, but no new data item can be locked. The two-phase locking scheme guarantees the serializability of a schedule. Because of its simplicity, the above scheduling method is very practical. However, it may lead to a deadlock. A deadlock occurs when two transactions are waiting for each other to release locks and both cannot proceed. A deadlock prevention (or detection) strategy is needed to handle the situation. For example, this can be achieved by requiring that a transactions locks all data items it needs for its execution before it can proceed; when the transaction finds that a needed data item is already locked, then it releases all locks. If a transaction fails for whatever reason after (partially committed) or (active) while updating the database, it may be necessary to bring the database to its previous (original) state by undoing the transaction. This operation is called roll-back. A roll-back operation requires some information about the changes made on the data items during a transaction. Such information is usually kept outside the database in a system log. Generally, roll-back operations are part of the techniques used to recover from transaction failures. Database Security A database needs to be protected against unauthorized access. It is the responsibility of the DBA to create account numbers and passwords for legitimate users. The DBA can also specify the type of privileges a particular account has. In relational databases, this includes the privilege to create base relations, create views, alter relations by adding or dropping a column, and delete relations. The DBA can also revoke privileges that were granted previously. In SQL, the command GRANT is used to grant privileges and the REVOKE command to revoke privileges that have been granted. The concept of views can serve as a convenient security mechanism. Consider a relation EMPLOYEE that gives the name of an employee, date of birth, the department worked for, address, phone number, and salary. A database user who is not allowed to have access to the salary of employees from his own department can have this portion of the database hidden from him. This can be achieved by limiting his access to a view obtained from the relation EMPLOYEE by selecting only those tuples where the department attribute is different from his. Database security can be enhanced by using data encryption. The idea here is to encrypt the data using some coding technique. An unauthorized user will have difficulty deciphering the encrypted data. Only authorized users are provided with keys to decipher the encoded data. Emerging Trends Object-Oriented Databases Object-oriented database systems (OODBMSs) [Brown, 1991] are one of the latest trends in database technol- ogy. The emergence of OODBMS is in response to the requirements of advanced applications. In general, traditional commercial and administrative applications can be effectively modeled using one of the three record- based data models. These applications are characterized by simple data types. Furthermore, for such applica- tions, access and relationships are based on data values. Advanced database applications such as those found in engineering CAD/CAM require complex data structures. When these applications are modeled using the relational model, they require an excessive number of relations. In addition, a large number of complex ? 2000 by CRC Press LLC operations are usually needed to produce an answer. This leads, in most cases, to unacceptable performance levels. The notion of “object” is central to OODBMS. An object can be seen as being an entity consisting of its own private memory and external interface (or protocol). The private memory is used to store the state of the object, and the external interface consists of a set of operations that can be performed on the object. An object communicates with other objects through messages sent to its external interface. When an object receives a message, it responds by using its own procedures, known as methods. The methods are responsible for processing the data in the object’s private memory and sending messages to other objects to perform specific tasks and possibly send back appropriate results. The object-oriented approach provides for a high level of abstraction. In addition, this model has constructs that can be used to define new data types and specialized operators that can be applied to them. This feature is known as encapsulation. An object is usually a member of a class. The class specifies the internal structure and the external interface of an object. New object classes can be defined as a specialization of existing ones. For example, in a university environment, the object type “faculty” can be seen as a specialization of the object type “employee.” Since a faculty is a university employee, it has all the properties of a university employee plus some of its own. For example, some of the general operations that can be performed on an employee could be “raise_salary,” “fire_employee,” “transfer_employee.” For a faculty, specialized operations such as “faculty_tenure” could be defined. Faculty can be viewed as a subclass of employee. As a result, faculty (the subclass) will respond to the same messages as employee (the superclass) in addition to those defined specifically for faculty. This technique is known as inheritance. A subclass is said to inherit the behavior of its superclass. Opponents to the object-oriented paradigm point to the fact that while this model has greater modeling capability, it lacks the simplicity and the strong theoretical foundations of the relational model. Also, the reappearance of the navigational approach is seen by many as a step backward. Supporters of the object-oriented approach believe that a navigational approach is a necessity in several applications. They point to the rich modeling capability of the model, its high level of abstraction, and its suitability for modular design. Distributed Databases A distributed database [Ozsu and Valdurez, 1991] is a collection of interrelated databases spread over the nodes of a computer network. The management of the distributed database is the responsibility of a software system usually known as distributed DBMS (DDBMS). One of the tasks of the DDBMS is to make the distributed nature of the database transparent to the user. A distributed database usually reflects the distributed nature of some applications. For example, a bank may have branches in different cities. A database used by such an organization is usually distributed over all these sites. The different sites are connected by a computer network. A user may access data stored locally or access data stored at other sites through the network. Distributed databases have several advantages. In distributed databases, the effect of a site failure or data loss at a particular node can be minimized through data replication. However, data replication reduces security and makes the process of keeping the database consistent more complicated. In distributed databases, data is decomposed into fragments that are allocated to the different sites. A fragment is allocated to a site in a way that maximizes local use. This allocation scheme, which is known as data localization, reduces the frequency of remote access. In addition, since each site deals with only a portion of the database, local query processing is expected to exhibit increased performance. A distributed database is inherently well suited for parallel processing at both interquery and intraquery levels. Parallel processing at the interquery level is the ability to have multiple queries executed concurrently. Parallelism at the intraquery level results from the possibility of a single query being simultaneously handled by many sites, each site acting on a different portion of the database. The data distribution increases the complexity of DDBMS over a centralized DBMS. In fact, in distributed databases, several research issues in distributed query processing, distributed database design, and distributed transaction processing remain to be solved. It is only then that the potential of distributed databases can be fully appreciated. ? 2000 by CRC Press LLC Parallel Database Systems There has been a continuing increase in the amount of data handled by database management systems (DBMSs) in recent years. Indeed, it is no longer unusual for a DBMS to manage databases ranging in sizes from hundreds of gigabytes to terabytes. This massive increase in database sizes is coupled with a growing need for DBMSs to exhibit more sophisticated functionality such as the support of object-oriented, deductive, and multimedia applications. In many cases, these new requirements have rendered existing DBMSs unable to provide the necessary system performance, especially given that many mainframe DBMSs already have difficulty meeting the I/O and CPU performance requirements of traditional information systems that service large numbers of concurrent users and/or handle massive amounts of data [DeWitt and Gray, 1992]. To achieve the required performance levels, database systems have been increasingly required to make use of parallelism. Two approaches were suggested to provide parallelism in database systems [Abdelguerfi and Lavington, 1995]. The first approach uses massively parallel general-purpose hardware platforms. Commercial systems, such as Intel’s nCube and IBM’s SP2 follow this approach and support Oracle’s Parallel Server. The second approach makes use of arrays of off-the-shelf components to form custom massively parallel systems. Usually, these hardware systems are based on MIMD parallel architectures. The NCR 3700 and the Super Database Computer II (SDC-II) are two such systems. The NCR 3700 now supports parallel version of Sybase relational DBMS. The number of general purpose or dedicated parallel database computers is increasing each year. It is not unrealistic to envisage that most high performance database management systems in the year 2000 will support parallel processing. The high potential of parallel databases in the future urges both the database vendors and practitioners to understand the concept of parallel database system in depth. It is noteworthy that in recent years, popularity of the client/server architecture has increased. This archi- tecture is practically a derivative of shared-nothing case. In this model, clients’ nodes access data through one or more servers. This approach derives its strength from an attractive price/performance ratio, a high level of scalability, and the ease with which additional remote hosts can be integrated into the system. Another driving force of the client/server approach is the current trend toward corporate downsizing. Multimedia Yet another new generation database application is multimedia, where non-text forms of data, such as voice, video, and image, are accessed via some form of a user interface. Hypermedia interfaces are becoming the primary delivery system for the multimedia applications. These interfaces, such as Mosaic, allow users to browse through an information base consisting of many different types of data. The basis of hypermedia is the hypertext, where some text based information is accessed in a non-sequential manner. Hypermedia is an extension of hypertext paradigm into multimedia. Defining Terms Database: A shared pool of interrelated data. Database computer: A special hardware and software configuration aimed primarily at handling large data- bases and answering complex queries. Database management system (DBMS): A software system that allows for the definition, construction, and manipulation of a database. Data model: An integrated set of tools to describe the data and its structure, data relationships, and data constraints. Distributed database: A collection of multiple, logically interrelated databases distributed over a computer network. Related Topic 87.3 Data Types and Data Structures ? 2000 by CRC Press LLC References M. Abdelguerfi and A. K. Sood, Eds., Special Issue on Database Computers, IEEE Micro, December 1991. M. Abdelguerfi and S. Lavingston, Eds., Emerging Trends in Database and Knowledge Base Machines, IEEE Computer Science Press, 1995. A. Brown, Object-Oriented Databases: Applications in Software Engineering, New York: McGraw-Hill, 1991. E. F. Codd, “A relational model of data for large shared data banks,” Communications of the ACM, pp. 377–387, June 1970. D. DeWitt and J. Gray, “Parallel database systems: The future of high performance database systems”, Commu- nications of the ACM, pp. 85-98, June 1992. R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, Redwood City, Calif.: Benjamin/ Cummings, 2 ed., 1994. D. Maier, The Theory of Relational Databases, New York: Computer Science Press, 1983. M. T. Ozsu and P. Valdurez, Principles of Distributed Database Systems, Englewood Cliffs, N.J.: Prentice-Hall, 1991. 94.2 Rule-Based Expert Systems Jay Liebowitz Expert systems is probably the most practical application of artificial intelligence (AI). Artificial intelligence, as a field, has two major thrusts: (1) to supplement human brain power with intelligent computer power and (2) to better understand how we think, learn, and reason. Expert systems are one application of AI, and they are being developed and used throughout the world [Feigenbaum et al., 1988; Liebowitz, 1990]. Other major applications of AI are robotics, speech understanding, natural-language understanding, computer vision, and neural networks. Expert systems are computer programs that emulate the behavior of a human expert in a well-bounded domain of knowledge [Liebowitz, 1988]. They have been used in a number of tasks, ranging from sheep reproduction management in Australia, hurricane damage assessment in the Caribbean, boiler plant operation in Japan, computer configuration in the United States, to strategic management consulting in Europe [Liebowitz, 1991b]. Expert systems technology has been around since the late 1950s, but it has been only since 1980–1981 that the commercialization of expert systems has emerged [Turban, 1992]. An expert system typically has three major components: the dialog structure, inference engine, and knowledge base [Liebowitz and DeSalvo, 1989]. The dialog structure is the user interface that allows the user to interact with the expert system. Most expert systems are able to explain their reasoning, in the same manner that one would want human experts to explain their decisions. The inference engine is the control structure within the expert system that houses the search strategies to allow the expert system to arrive at various conclusions. The third component is the knowledge base, which is the set of facts and heuristics (rules of thumb) about the specific domain task. The knowledge principle says that the power of the expert system lies in its knowledge base. Expert system shells have been developed and are widely used on various platforms to help one build an expert system and concentrate on the knowledge base construction. Most operational expert systems are integrated with existing databases, spreadsheets, optimization modules, or information systems [Mockler and Dologite, 1992]. The most successful type of expert system is the rule-based, or production, system. This type of expert system is chiefly composed of IF-THEN (condition-action) rules. For example, the infamous MYCIN expert system, developed at Stanford University for diagnosing bacterial infections in the blood (meningitis), is rule-based, consisting of 450–500 rules. XCON, the expert system at Digital Equipment Corporation used for configuring VAX computer systems, is probably the largest rule-based expert system, consisting of over 11,000 rules. There are other types of expert systems that represent knowledge in ways other than rules or in conjunction with rules. Frames, scripts, and semantic networks are popular knowledge representation methods that could be used in expert systems. ? 2000 by CRC Press LLC The development of rule-based systems is typically called knowledge engineering. The knowledge engineer is the individual involved in the development and deployment of the expert system. Knowledge engineering, in rule-based systems, refers primarily to the construction of the knowledge base. As such, there are six major steps in this process, namely (1) problem selection, (2) knowledge acquisition, (3) knowledge representation, (4) knowledge encoding, (5) knowledge testing and evaluation, and (6) implementation and maintenance. The knowledge engineering process typically uses a rapid prototyping approach (build a little, test a little). Each of the six steps in the knowledge engineering process will be briefly discussed in turn. Problem Selection In selecting an appropriate application for expert systems technology, there are a few guidelines to follow: ?Pick a problem that is causing a large number of people a fair amount of grief. ?Select a “doable,” well-bounded problem (i.e., task takes a few minutes to a few hours to solve)—this is especially important for the first expert system project for winning management’s support of the tech- nology. ?Select a task that is performed frequently. ?Choose an application where there is a consensus on the solution of the problem. ?Pick a task that utilizes primarily symbolic knowledge. ?Choose an application where an expert exists and is willing to cooperate in the expert systems develop- ment. ?Make sure the expert is articulate and available and a backup expert exists. ?Have the financial and moral support from management. The problem selection and scoping are critical to the success of the expert systems project. As with any information systems project, the systems analysis stage is an essential and crucial part of the development process. With expert systems technology, if the problem domain is not carefully selected, then difficulties will ensue later in the development process. Knowledge Acquisition After the problem is carefully selected and scoped, the next step is knowledge acquisition. Knowledge acquisition involves eliciting knowledge from an expert or multiple experts and also using available documentation, regulations, manuals, and other written reports to facilitate the knowledge acquisition process. The biggest bottleneck in expert systems development has, thus far, been in the ability to acquire knowledge. Various automated knowledge acquisition tools, such as Boeing Computer Services’ AQUINAS, have been developed to assist in this process, but there are very few knowledge acquisition tools on the market. The most commonly used approaches for acquiring/eliciting knowledge include: interviewing (structured and unstructured), pro- tocol analysis, questionnaires (structured and open-ended), observation, learning by example/analogy, and other various techniques (Delphi technique, statistical methods). To aid the knowledge acquisition process, some helpful guidelines are: ?Before interviewing the expert, make sure that you (as the knowledge engineer) are familiar/ comfortable with the domain. ?The first session with the expert should be an introductory lecture on the task at hand. ?The knowledge engineer should have a systematic approach to acquiring knowledge. ?Incorporate the input and feedback from the expert (and users) into the system—get the expert and users enthusiastic about the project. ?Pick up manuals and documentation on the subject material. ?Tape the knowledge acquisition sessions, if allowed. ? 2000 by CRC Press LLC Knowledge Representation After acquiring the knowledge, the next step is to represent the knowledge. In a rule-based expert system, the IF-THEN (condition-action) rules are used. Rules are typically used to represent knowledge if the preexisting knowledge can best be naturally represented as rules, if the knowledge is procedural, if the knowledge is mostly context-independent, and if the knowledge is mostly categorical (“yes-no” type of answers). Frames, scripts, and semantic networks are used as knowledge representation schemes for more descriptive, declarative knowl- edge. In selecting an appropriate knowledge representation scheme, try to use the representation method which most closely resembles the way the expert is thinking and expressing his/her knowledge. Knowledge Encoding Once the knowledge is represented, the next step is to encode the knowledge. Many knowledge engineers use expert system shells to help develop the expert system prototypes. Other developers may build the expert system from scratch, using such languages as Lisp, Prolog, C, and others. The following general guidelines may be useful in encoding the knowledge: ? Remember that for every shell there is a perfect task, but for every task there is NOT a perfect shell. ? Consider using an expert system shell for prototyping/proof-of-concept purposes—remember to first determine the requirements of the task, instead of force-fitting a shell to a task. ? Try to develop the knowledge base in a modular format for ease of updating. ? Concentrate on the user interface and human factors features, as well as the knowledge base. ? Use an incremental, iterative approach. ? Consider whether uncertainty should play a part in the expert system. ? Consider if the expert reasons in a data-driven manner (forward chaining) or a goal-directed manner (backward chaining), or both. Knowledge Testing and Evaluation Once the knowledge is encoded in the system, testing and evaluation need to be conducted. Verification and validation refers to checking for the consistency of the knowledge/logic and checking the quality/accuracy of advice reached by the expert system. Various approaches to testing can be used, such as: performing “backcast- ing” by running the expert system (using a representative set of test cases) against documented cases and comparing the expert system-generated results with the historical results, using blind verification tests (modified Turing test), having the expert and other experts test the system, using statistical methods for testing, and others. In evaluating the expert system, the users should evaluate the design of the human factors in the system (i.e., instructions, free-text comments, ease of updating, exiting capabilities, response time, display and pre- sentation of conclusions, ability to restart, ability for user to offer degree of certainty, graphics, utility of the system, etc.). Implementation and Maintenance Once the system is ready to be deployed within the organization, the knowledge engineer must be cognizant of various institutionalization factors [Liebowitz, 1991a; Turban and Liebowitz, 1992]. Institutionalization refers to implementing and transitioning the expert system into the organization. Frequently, the technology is not the limiting factor—the management of the technology is often the culprit. An expert system may be accurate and a technical success, but without careful attention to management and institutionalization considerations, the expert system may be a technology transfer failure. There are several useful guidelines for proper institu- tionalization of expert systems: ? Know the corporate culture in which the expert system is deployed. ? Planning for the institutionalization process must be thought out well in advance, as early as the requirements analysis stage. ? 2000 by CRC Press LLC ? Through user training, help desks, good documentation, hotlines, etc., the manager can provide mech- anisms to reduce “resistance to change.” ? Solicit and incorporate users’ comments during the analysis, design, development, and implementation stages of the expert system. ? Make sure there is a team/individual empowered to maintain the expert system. ? Be cognizant of possible legal problems resulting from the use and misuse of the expert system. ? During the planning stages, determine how the expert system will be distributed. ? Keep the company’s awareness of expert systems at a high level throughout the system’s development and implementation, and even after its institutionalization. Defining Terms Expert systems: A computer program that emulates a human expert in a well-bounded domain of knowledge. Knowledge base: The set of facts and rules of thumb (heuristics) on the domain task. Knowledge engineering: The process of developing an expert system. References E.A. Feigenbaum, P. McCorduck, and P. Nii, The Rise of the Expert Company, New York: Times Books, 1988. J.K. Lee, J. Liebowitz, and Y.M. Chae, Eds., Proceedings of the Third World Congress on Expert Systems, New York: Cognizant Communication Corp., 1996. J. Liebowitz, Introduction to Expert Systems, New York: Mitchell/McGraw-Hill Publishing, 1988. J. Liebowitz, Ed., Expert Systems for Business and Management, Englewood Cliffs, N.J.: Prentice-Hall, 1990. J. Liebowitz, Institutionalizing Expert Systems: A Handbook for Managers, Englewood Cliffs, N.J.: Prentice-Hall, 1991a. J. Liebowitz, Ed., Operational Expert System Applications in the United States, New York: Pergamon Press, 1991b. J. Liebowitz, and D. DeSalvo, Eds., Structuring Expert Systems: Domain, Design, and Development, Englewood Cliffs, N.J.: Prentice-Hall, 1989. R. Mockler and D. Dologite, An Introduction to Expert Systems, New York: Macmillan Publishing, 1992. E. Turban, Expert Systems and Applied Artificial Intelligence, New York: Macmillan Publishing, 1992. E. Turban and J. Liebowitz, Eds., Managing Expert Systems, Harrisburg, Pa.: Idea Group Publishing, 1992. Further Information There are several journals and magazines specializing in expert systems that should be consulted: Expert Systems with Applications: An International Journal, New York/Oxford: Pergamon Press, Elsevier. Expert Systems, Medford, N.J.: Learned Information, Inc. IEEE Expert, Los Alamitos, Calif.: IEEE Computer Society Press. AI Expert, San Francisco: Miller Freeman Publications. Intelligent Systems Report, Atlanta: AI Week, Inc. ? 2000 by CRC Press LLC