Boykin, J. “Operating Systems”
The Electrical Engineering Handbook
Ed. Richard C. Dorf
Boca Raton: CRC Press LLC, 2000
96
Operating Systems
96.1Introduction
96.2Types of Operating Systems
96.3Distributed Computing Systems
96.4Fault-Tolerant Systems
96.5Parallel Processing
96.6Real-Time Systems
96.7Operating System Structure
96.8Industry Standards
96.9Conclusions
96.1 Introduction
An operating system is just another program running on a computer. It is unlike any other program, however.
An operating system’s primary function is the management of all hardware and software resources. It manages
processors, memory, I/O devices, and networks. It enforces policies such as protection of one program from
another and fairness to ensure that users have equal access to system resources. It is privileged in that it is the
only program that can perform specialized hardware operations. The operating system is the primary program
upon which all other programs rely.
To understand modern operating systems we must begin with some history [Boykin and LoVerso, 1990].
The modern digital computer is only about 40 years old. The first machines were giant monoliths housed in
special rooms, and access to them was carefully controlled. To program one of these systems the user scheduled
access time well in advance, for in those days the user had sole access to the machine. The program such a user
ran was the only program running on the machine.
It did not take long to recognize the need for better control over computer resources. This began in the mid-
1950s with the dawn of batch processing and early operating systems that did little more than load programs
and manage I/O devices.
In the 1960s we saw more general-purpose systems. New operating systems that provided time-sharing and
real-time computing were developed. This was the time when the foundation for all modern operating systems
was laid.
Today’s operating systems are sophisticated pieces of software. They may contain millions of lines of code
and provide such services as distributed file access, security, fault tolerance, and real-time scheduling. In this
chapter we examine many of these features of modern operating systems and their use to the practicing engineer.
96.2 Types of Operating Systems
Different operating systems (OS) provide a wide range of functionality. Some are designed as single-user systems
and some for multiple users. The operating system, with appropriate hardware support, can protect one
executing program from malicious or inadvertent attempts of another to modify or examine its memory. When
connected to a storage device such as a disk drive, the OS implements a file system to permit storage of files.
Joseph Boykin
Clarion Advanced Storage
? 2000 by CRC Press LLC
The file system often includes security features to protect against file access by unauthorized users. The system
may be connected to other computers via a network and thus provide access to remote system resources.
Operating systems are often categorized by the major functionality they provide. This functionality includes
distributed computing, fault tolerance, parallel processing, real-time, and security. While no operating system
incorporates all of these capabilities, many have characteristics from each category.
An operating system does not need to contain every modern feature to be useful. For example, MS-DOS
1
is
a single-user system with few of the features now common in other systems. Indeed, this system is little more
than a program loader reminiscent of operating systems from the early 1960s. Unlike those vintage systems,
there are numerous applications that run under MS-DOS. It is the abundance of programs that solve problems
from word processing to spreadsheets to graphics that has made MS-DOS popular. The simplicity of these
systems is exactly what makes them popular for the average person.
Systems capable of supporting multiple users are termed time-sharing systems; the system is shared among
all users, with each user having the view that he or she has all system resources available. Multiuser operating
systems provide protection for both the file system and the contents of main memory. The operating system
must also mediate access to peripheral devices. For example, only one user may have access to a tape drive at
a time.
Fault-tolerant systems rely on both hardware and software to ensure that the failure of any single hardware
component, or even multiple components, does not cause the system to cease operation. To build such a system
requires that each critical hardware component be replicated at least once. The operating system must be able
to dynamically determine which resources are available and, if a resource fails, move a running program to an
operational unit.
Security has become more important during recent years. Theft of data and unauthorized access to data are
prevented in secure systems. Within the United States, levels of security are defined by a government-produced
document known as the Orange Book. This document defines seven levels of security, denoted from lowest to
highest as D, C1, C2, B1, B2, B3, and A1. Many operating systems provide no security and are labeled D. Most
time-sharing systems are secure enough that they could be classified at the C1 level. The C2 and B1 levels are
similar, and this is where most secure operating systems are currently classified. During the 1990s B2 and B3
systems will become readily available from vendors. The A1 level is extremely difficult to achieve, although
several such systems are being worked on.
In the next several sections we expand upon the topics of distributed computing, fault-tolerant systems,
parallel processing, and real-time systems.
96.3 Distributed Computing Systems
The ability to connect multiple computers through a communications network has existed for many years.
Initially, computer-to-computer communication consisted of a small number of systems performing bulk file
transfers. The 1980s brought the invention of high-speed local area networks, or LANs. A LAN allows hundreds
of machines to be connected together. New capabilities began to emerge, such as virtual terminals that allowed
a user to log on to a computer without being physically connected to that system. Networks were used to
provide remote access to printers, disks, and other peripherals. The drawback to these systems was the software;
it was not sophisticated enough to provide a totally integrated environment. Only small, well-defined interac-
tions among machines were permitted.
Distributed systems provide the view that all resources from every computer on the network are available
to the user. What’s more, access to resources on a remote computer is viewed in the same way as access to
resources on the local computer. For example, a file system that implements a directory hierarchy, such as
UNIX,
2
may have some directories on a local disk while one or more directories are on a remote system.
Figure 96.1 illustrates how much of the directory hierarchy would be on the local system, while user directories
(shaded directories) could be on a remote system.
1
MS-DOS is a trademark of Microsoft, Inc.
2
UNIX is a trademark of UNIX Software Laboratories (USL).
? 2000 by CRC Press LLC
There are many advantages of distributed systems. Advantages over centralized systems include [Tanenbaum,
1992]:
? Economics: Microprocessors offer a better price/performance than mainframes.
? Speed: A distributed system may have more total computing power than a mainframe.
? Reliability: If one machine crashes, the system as a whole can still survive.
? Incremental growth: Computing power can be added in small increments.
Advantages over nonnetworked personal computers include [Tanenbaum, 1992]:
? Data sharing: Allow many users access to a common database.
? Device sharing: Allow many users to share expensive peripherals like color printers.
? Communication: Make human-to-human communication easier, for example, by electronic mail.
? Flexibility: Spread the workload over the available machines in the most cost effective way.
FIGURE 96.1 UNIX file system hierarchy in a distributed environment.
? 2000 by CRC Press LLC
While there are many advantages to distributed systems, there are also several disadvantages. The primary
difficulty is that software for implementing distributed systems is large and complex. Small personal computers
could not effectively run modern distributed applications. Software development tools for this environment
are not well advanced. Thus, application developers are having a difficult time working in this environment.
An additional problem is network speed. Most office networks are currently based on IEEE standard 802.3
[IEEE, 1985], commonly (although erroneously) called Ethernet, which operates at 10 Mb/s (ten million bits
per second). With this limited bandwidth, it is easy to saturate the network. While higher-speed networks such
as FDDI
1
and ATM
2
networks do exist, they are not yet in common use. While distributed computing has many
advantages, we must also understand that without appropriate safeguards, our data may not be secure.
Security is a difficult problem in a distributed environment. Whom do you trust when there are potentially
thousands of users with access to your local system? A network is subject to security attack by a number of
mechanisms. It is possible to monitor all packets going across the network; hence, unencrypted data are easily
obtained by an unauthorized user. A malicious user may cause a denial-of-service attack by flooding the network
with packets, making all systems inaccessible to legitimate users.
Finally, we must deal with the problem of scale. To connect a few dozen or even a few hundred computers
together may not cause a problem with current software. However, global networks of computers are now being
installed. Scaling our current software to work with tens of thousands of computers running across large
geographic boundaries with many different types of networks is a challenge that has not yet been met.
96.4 Fault-Tolerant Systems
Most computers simply stop running when they break. We take this as a given. There are many environments,
however, where it is not acceptable for the computer to stop working. The space shuttle is a good example.
There are other environments where you would simply prefer if the system continued to operate. A business
using a computer for order entry can continue to operate if the computer breaks, but the cost and inconvenience
may be high. Fault-tolerant systems are composed of specially designed hardware and software that are capable
of continuous operation.
To build a fault-tolerant system requires both hardware and software modifications. Let’s take a look at an
example of a small problem that illustrates the type of changes that must be made. Remember, the goal of such
a system is to achieve continuous operation. That means we can never purposely shut the computer off. How
then do we repair the system if we cannot shut it off? First, the hardware must be capable of having circuit
boards plugged and unplugged while the system is running; this is not possible on most computers. Second,
removing a board must be detected by the hardware and reported to the operating system. The operating
system, the manager of resources, must then discontinue use of that resource.
Each component of the computer system, both hardware and software, must be specially built to handle
failures. It should also be obvious that a fault-tolerant system must have redundant hardware. If, for example,
a disk controller should fail, there must be another controller communicating with the disks that can take over.
One problem with implementing a fault-tolerant system is knowing when something has failed. If a circuit
board totally ceases operation, we can determine the failure by its lack of response to commands. Another
failure mode exists where the failing component appears to work but is operating incorrectly. A common
approach to detect this problem is a voting mechanism. By implementing three hardware replicas the system
can detect when any one has failed by its producing output inconsistent with the other two. In that case, the
output of the two components in agreement is used.
The operating system must be capable of restarting a program from a known point when a component on
which the program was running has failed. The system can use checkpoints for this purpose. When an application
program reaches a known state, such as when it completes a transaction, it stores the current state of the
1
Fiber distributed data interface. The FDDI standard specifies an optical fiber ring with a data rate of 100 Mb/s.
2
Asynchronous transfer mode. A packet-oriented transfer mode moving data in fixed-size packets called cells. There is
no fixed speed for ATM. Typical speed is currently 155 Mb/s, although there are implementations running at 2 Gb/s.
? 2000 by CRC Press LLC
program and all I/O operations; this is known as a checkpoint. Should a component on which this program
is running fail, the operating system can restart the program from the most recent checkpoint.
While the advantage of fault-tolerant systems is obvious, they come at a price. Redundant hardware is
expensive, and software capable of recovering from faults runs more slowly. As with many other systems, the
price may be more than offset by the advantage of continuous computing.
96.5 Parallel Processing
No matter how fast computers become, it seems they are never fast enough. Manufacturers make faster
computers by decreasing the amount of time it takes to do each operation. An alternative is to build a computer
that performs several operations simultaneously. A parallel computer, also called a multiprocessor, is one that
contains more than one CPU.
1
The advantage of a parallel computer is that it can run more than one program simultaneously. In a general-
purpose time-sharing environment, parallel computers can greatly enhance overall system throughput. A
program shares a CPU with fewer programs. This approach is similar to having several computers connected
on a network but has the advantage that all resources are more easily shared.
To take full advantage of a parallel computer will require changes to the operating system [Boykin and
Langerman, 1990] and application programs. Most programs are easily divided into pieces that can each run
at the same time. If each of these pieces is a separate thread of control, they could run simultaneously on a
parallel computer. By so dividing the application, the program may run in less time than it would on a single-
processor (uniprocessor) computer.
Within the application program, each thread runs as if it were the only thread of control. It may call functions,
manipulate memory, perform I/O operations, etc. If the threads do not interact with each other, then, to the
application programmer, there is little change other than determining how to subdivide the program. However,
it would be unusual for these threads not to interact. It is this interaction that makes parallel programming
more complex.
In principle, the solution is rather simple. Whenever a thread will manipulate memory or perform an I/O
operation, it must ensure that it is the only thread that will modify that memory location or do I/O to that file
until it has completed the operation. To do so, the programmer uses a lock. A lock is a mechanism that allows
only a single thread to execute a given code segment at a time. Consider an application with several threads of
control. Each thread performs an action and writes the result to a file—the same file. Within each thread we
might have code that looks as follows:
thread()
{
dowork();
writeresult();
}
writeresult()
{
lock();
write(logfid, result, 512);
unlock();
}
In this example the writeresult function calls the lock function before it writes the result and calls unlock
afterward. Other threads simultaneously calling writeresult will wait at the call to lock until the thread that
currently holds the lock calls the unlock function.
While this approach is simple in principle, in practice it is more difficult. It takes experience to determine
how a program may be divided. Even with appropriate experience, it is more difficult to debug a multithreaded
1
Central processing unit, the hardware component that does all arithmetic and logical operations.
? 2000 by CRC Press LLC
application. With several threads of control operating simultaneously, it is not simply a matter of stepping
through the program line by line to find a mistake. Most often, it is the interaction between threads that is the
problem.
Multithreading a program may not be a trivial matter. As with most types of programming, however,
experience makes the process easier. The benefit is significantly enhanced performance.
96.6 Real-Time Systems
Real-time systems are those that guarantee that the system will respond in a predetermined amount of time.
We use real-time systems when, for example, computers control an assembly line or run a flight simulator. In
such an environment we define an action that must occur and a deadline by which we wish that action to take
place. On an assembly line an event may occur, such as a part arriving at a station, and an action, such as
painting that part. The deadline we impose will be based on the speed of the assembly line. Obviously, we must
paint the part before it passes to the next station. This is called a hard real-time system because the system
must meet a strict deadline.
Another class of system is termed soft real-time. These are environments in which response time is important,
but the consequences are not as serious as, for example, on an assembly line. Airline reservation systems are
in this category. Rapid response time to an event, such as an agent attempting to book a ticket, is important
and must be considered when the system performs other activities.
One way of distinguishing hard and soft real-time systems is by examining the value of a response over time.
For example, if a computer was controlling a nuclear reactor and the reactor began to overheat, the command
to open the cooling valves has extremely high value until a deadline, when the reactor explodes. After that
deadline, there is no value in opening the valves (see Fig. 96.2).
Relatively few events require that type of responsiveness. Most events have a deadline, but there continues
to be value in responding to that event even past the deadline. In our airline reservation example, the airline
may wish to respond to a customer request within, say, 10 seconds. However, if the response comes in 11
seconds, there is still value in the response. The value is lessened because the customer has become upset. As
time increases, the customer becomes more and more upset and the value of responding decreases. We illustrate
this in Fig. 96.3.
96.7 Operating System Structure
Operating systems are large, complex pieces of software. They must handle asynchronous events such as
interrupts from I/O devices, control hardware memory management units (MMUs) to implement virtual
memory, support multiple simultaneous users, implement complex network protocols, and much more. As
with any software of this magnitude, an operating system is logically divided into smaller pieces. The structure
of a typical modern operating system is depicted in Fig. 96.4.
FIGURE 96.2 Relative value of a response over time in
a critical situation.
FIGURE 96.3 Relative value of a response over time in
a noncritical situation.
? 2000 by CRC Press LLC
From the user’s standpoint, the operating system is a collection of system calls—the programmers’ interface.
Sometimes this is termed an application program interface, or API. System calls provide the mechanism for an
application program to obtain services from the system. System calls exist to perform file operations such as
create, open, close, read, and write. For terminals, system calls would perform such functions as changing the
baud rate and number of parity bits. Network connections may be established or network protocol options,
such as the size of network buffers, are also controlled through system calls.
While every operating system provides a system call interface, there is little uniformity to the appearance of
that interface. Some systems provide an interface that appears as a simple function call. For example, to open
a file under the UNIX operating system, we use the following system call:
open("/home/boykin/crc-press/oschapter", O_RDONLY);
Other operating systems require a user to fill in complex data structures for various operations. For example,
the following code fragment illustrates how to send an IPC message using the Mach operating system’s inter-
process communication (IPC) facility [Boykin et al., 1993]:
msg_header_t header;
header.msg_simple = TRUE;
header.msg_size = sizeof(header);
header.msg_type = MSG_TYPE_NORMAL;
header.msg_local_port = PORT_NULL;
header.msg_remote_port = remote_port;
header.msg_id = 100;
Regardless of the interface format, a programmer should become familiar with the parameters, options, and
return codes from each system call to use the system proficiently.
Beneath the programming interface lies the heart of the operating system. We can divide the system into
two major sections. The first section directly implements the system calls. This includes the file system, terminal
FIGURE 96.4 The structure of a modern operating system.
? 2000 by CRC Press LLC
handling, etc. The second section provides basic capabilities upon which the rest of the system is built.
Interprocess communication, memory management, and process management are all examples of these basic
capabilities. A brief explanation of each of these sections will be given shortly.
The lowest level of the operating system interfaces directly with the computer hardware. For each physical
device, such as a disk, tape, or serial line, a device driver must exist to communicate with the hardware. Device
drivers accept requests to read or write data or determine the status of the device. They may do polled I/O or
be interrupt driven, although polled I/O is usually only done on small personal computers. Writing a device
driver requires a thorough knowledge of the hardware as well as the interface to the operating system.
In addition to I/O devices, the system must also manipulate such hardware as counters, timers and memory
management units. Timers are used to satisfy user requests such as terminating an operation after a specified
length of time. MMUs provide the ability to protect memory. Each time a program is run, the operating system
programs the MMU with the physical memory addresses the program may access. Any attempt to access other
memory is not allowed by the MMU.
An MMU is also required to implement virtual memory. Virtual memory allows a program to use more
memory than is physically present on the machine. The operating system implements virtual memory by using
an external device, typically a disk, to store portions of the program that are not currently in use. When a
program attempts to access memory temporarily stored on disk, the MMU traps
1
to the operating system,
which reads the memory from disk and restarts the program.
In recent years the structure depicted here has been changing. A new concept, called the microkernel, has
begun to emerge. The idea behind a micro-kernel is to dramatically reduce the size of the operating system by
placing most OS subsystems in the application layer. A micro-kernel would not be a usable system by itself. A
number of programs would be run on top of the micro-kernel to provide such services as a file system and
network protocols.
In the micro-kernel architecture shown in Fig. 96.5, notice that subsystems traditionally within the operating
system are now at the same level as an application program. An application program wishing to, for example,
open a file makes its request to the file system program, rather than the micro-kernel. The file system may call
upon other OS subsystems or on the micro-kernel to perform an operation.
From the user standpoint, there is no programming difference between a micro-kernel structure and the
traditional structure. There are two advantages of the micro-kernel approach. The first is that programming
1
A trap is a hardware signal that is received by the operating system. It is very similar to an interrupt from an I/O device
FIGURE 96.5 Micro-kernel structure.
? 2000 by CRC Press LLC
and debugging at the application layer is inherently simpler than programming at the OS layer. The benefit
here is to the OS designer and implementors who can now write and debug OS code faster and easier than
before. This benefits the user by having an operating system that is more reliable.
The second advantage stems from the ability to incorporate several different OS environments on top of the
same micro-kernel. In this way, the computer acts as though it is running several operating systems. For example,
if both MS-DOS and UNIX coexisted on the same micro-kernel, the user could choose to run an MS-DOS
spreadsheet or word processor and communicate using UNIX network commands. The user has gained
increased flexibility.
96.8 Industry Standards
As computer technologies come into widespread use, users begin to desire standardization. Standardization
allows a user to know that a program written to a standard will work without concern for which vendor supplies
the programming environment. Operating systems are no exception to this general rule, and there are several
standards, both industry standards and de facto standards, that apply. Porting software from one system to
another, often an expensive proposition, becomes a trivial task.
Perhaps the most notable OS standard is POSIX, standard number 1003 [IEEE, 1990], sponsored by the
IEEE Computer Society’s Technical Committee on Operating Systems. POSIX is a family of standards based
on the UNIX operating system that includes the system call interface, user-level commands, real-time extensions,
and networking extensions. The POSIX system call interface, 1003.1, was adopted by the U.S. government
verbatim as a Federal Information Processing Standard, FIPS 151. Many vendors conform to POSIX; thus, a
program that conforms to this standard can be ported to many system platforms without change.
An example of a de facto standard is the X/Open Portability Guide (XPG) [X/Open, 1989]. X/Open is not
a standards-setting body but is a joint initiative by members of the business community to adopt and adapt
existing standards into a consistent environment. The X/Open system interface and headers are based on POSIX
1003.1 but also include extensions to POSIX-defined interfaces as well as additional interfaces.
The importance of such standards is evidenced by the strong support of such organizations as the Open
Software Foundation. OSF’s OSF/1 operating system conforms to various POSIX standards. Where not super-
seded by POSIX, it also conforms to XPG and AT&T’s System V Interface Definition (SVID) [AT&T, 1985].
Conforming to these standards is considered critical for the success of OSF/1.
Some might consider an operating system such as MS-DOS to be a de facto standard. While MS-DOS is in
common use, however, it is proprietary software subject to change without notice. Defining a standard implies
an open system on which vendors and users agree.
96.9 Conclusions
I have been hearing for the past 15 years about the demise of the operating system. It has been said over and
over that the role of the OS will go away. So far, the only change has been to expand on the role the operating
system plays. One must remember that the operating system is not the user interface it portrays or the
applications that run on it. It is, as it always has been, the manager of all resources on a computer system.
While the interface to computers has changed and the use to which we apply computer technology has
changed, there will always be the need for an operating system. Without question, the OS will change as well.
We have already seen micro-kernel architectures begin to emerge from the research labs into commercial
operating systems. Distributed computing will become more widespread and force additional changes to the
operating system. Regardless of the changes that come, it will always be the operating system on which all other
programs rely.
Defining Terms
Distributed computing: An environment in which multiple computers are networked together and the
resources from more than one computer are available to a user. Those resources are accessed in a manner
identical to accessing resources on a local computer system.
? 2000 by CRC Press LLC
Fault-tolerant systems: A computer system with both hardware and software that are capable of continuous
operation even in the event hardware components fail.
File system: The logical organization of files on a storage device, typically a disk drive. The file system may
support a hierarchical structure with directories and subdirectories (sometimes called folders).
Interprocess communication: The transfer of information between two cooperating programs. Communi-
cation may take the form of a signal (the arrival of an event) or the transfer of data.
Parallel processing: A parallel computer is one that contains more than one CPU. Parallel processing is when
a program is divided into multiple threads of control, each of which is capable of running simultaneously.
On a parallel computer, multiple threads could be running at the same time, thus resulting in better
performance than on a uniprocessor system.
Process: A single executable program. A process is the context in which an operating system places a running
program. It contains the program itself as well as allocated memory, open files, network connections, etc.
Real-time computing: Support for environments in which response time to an event must occur within a
predetermined amount of time. Real-time systems may be categorized into hard and soft real-time.
Related Topics
90.3 Programming Methodology ? 95.2 Classifications
References
AT&T, System V Interface Definition, Spring 1985, Issue 1, AT&T Customer Information Center, Indianapolis,
Indiana.
J. Boykin, D. Kirschen, A. Langerman, and S. LoVerso, Programming Under Mach, Reading, Mass.: Addison-
Wesley, 1993.
J. Boykin and A. Langerman, “Mach/4.3BSD: Parallelization without reimplementation,” Computing Systems
Journal, vol. 3, no. 1, 1990.
J. Boykin and S. LoVerso, “Recent developments in operating systems,” Computer, vol. 23, no. 5, 1990.
H.M. Dietel, Operating Systems, 2nd ed., Reading, Mass.: Addison-Wesley, 1990.
IEEE, Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer
Specifications, American National Standard ANSI/IEEE Std. 802.3, 1985.
IEEE, Information Technology—Portable Operating System Interface (POSIX) Part 1: System Application Program
Interface (API) [C Language], New York: IEEE, 1990.
A. Silberschatz, J.L. Peterson, and P.B. Galvin, Operating Systems Concepts, 3rd ed., Reading, Mass.: Addison-
Wesley, 1991.
A.S. Tanenbaum, Modern Operating Systems, Englewood Cliffs, N.J.: Prentice-Hall, 1992.
X/Open Portability Guide, X/Open Company Ltd., Englewood Cliffs, N.J.: Prentice-Hall, 1989.
Further Information
Many textbooks describe operating system concepts. The three cited in the reference section [Dietel, 1990;
Silberschatz et al., 1991; and Tanenbaum, 1992] are excellent. The IEEE Computer Society has a number of
tutorials on operating system related topics such as fault tolerance, real-time, local area networks and distributed
processing. Readers should contact the Computer Society Press office at 10662 Los Vaqueros Circle, Los
Alamitos, Calif. 90720. Phone: 714-821-8380.
For those interested in learning more about the implementation of specific operating systems, M.J. Bach,
The Design of the UNIX Operating System, Prentice-Hall, 1986, describes the implementation of AT&T System V.
The 4.3BSD operating system is described in Leffler et al., The Design and Implementation of the 4.3BSD UNIX
Operating System, Addison-Wesley, 1990.
? 2000 by CRC Press LLC