Tinder, R.F., Oklobdzija, V.G., Hamacher, V.C., Vranesic, Z.G., Zaky, S.G., Raymond,
J. “Organization”
The Electrical Engineering Handbook
Ed. Richard C. Dorf
Boca Raton: CRC Press LLC, 2000
86
Organization
86.1Number Systems
Positional and Polynomial Representations?Unsigned Binary
Number System?Unsigned Binary-Coded Decimal, Hexadecimal,
and Octal Systems?Conversion between Number Systems?Signed
Binary Numbers?Floating-Point Number Systems
86.2Computer Arithmetic
Number Representation?Algorithms for Basic Arithmetic
Operations?Implementatino of Addition?Implementation of the
Multiplication Algorithm?Floating-Point Representation
86.3Architecture
Functional Units?Basic Operational Concepts? Performance?
Multiprocessors
86.4Microprogramming
Levels of Programming?Microinstruction Structure?
Microprogram Development?High-Level Languages for
Microprogramming?Emulation?Applications of
Microprogramming
86.1 Number Systems
Richard F. Tinder
Number systems provide the basis for conveying and quantifying information. Weather data, stocks, pagination
of books, weights and measures—these are just a few examples of the use of numbers that affect our daily lives.
For this purpose we find the decimal (or arabic) number system to be reliable and easy to use. This system
evolved presumably because early humans were equipped with a crude type of calculator, their ten fingers. A
number system that is appropriate for humans, however, may be intractable for use by a machine such as a
computer. Likewise, a number system appropriate for a machine may not be suitable for human use.
Before concentrating on those number systems that are useful in computers, it will be helpful to review the
characteristics that are desirable in any number system. There are four important characteristics in all:
?Distinguishability of symbols
?Arithmetic operations capability
?Error control capability
?Tractability and speed
To one degree or another the decimal system of numbers satisfies these characteristics for hard-copy transfer
of information between humans. Roman numerals and binary are examples of number systems that do not
satisfy all four characteristics for human use. On the other hand, the binary number system is preferable for
use in digital computers. The reason is simply put: current digital electronic machines recognize only two
identifiable states physically represented by a high voltage level and a low voltage level. These two physical states
are logically interpreted as the binary symbols 1 and 0.
Richard F. Tinder
Washington State University
Vojin G. Oklobdzija
University of California, Davis
V. Carl Hamacher
Queen's University, Canada
Zvonko G. Vranesic
University of Toronto
Safwat G. Zaky
University of Toronto
Jacques Raymond
University of Ottawa
? 2000 by CRC Press LLC
A fifth desirable characteristic of a number system to be used in a computer should be that it have a minimum
number of easily identifiable states. The binary number system satisfies this condition. However, the digital
computer must still interface with humankind. This is done by converting the binary data to a decimal and
character-based form that can be readily understood by humans. A minimum number of identifiable characters
(say 1 and 0, or true and false) is not practical or desirable for direct human use. If this is difficult to understand,
imagine trying to complete a tax form in binary or in any number system other than decimal. On the other
hand, use of a computer for this purpose would not only be practical but, in many cases, highly desirable.
Positional and Polynomial Representations
The positional form of a number is a set of side-by-side (juxtaposed) digits given generally in fixed-point form as
MSD Radix Point LSD
N
r
= (a
n–1
… a
3
a
2
a
1
a
0
. a
–1
a
–2
a
–3
… a
–m
)
r
(86.1)
Integer Fraction
where the radix (or base) r is the total number of digits in the number system and a is a digit in the set defined
for radix r. Here, the radix point separates n integer digits on the left from m fraction digits on the right. Notice
that a
n–1
is the most significant (highest-order) digit, called MSD, and that a
–m
is the least significant (lowest-
order) digit, denoted by LSD.
The value of the number in Eq. (86.1) is given in polynomial form by
(86.2)
where a
i
is the digit in the ith position with a weight r
i
.
Application of Eqs. (86.1) and (86.2) follows directly. For the decimal system r = 10, indicating that there
are 10 distinguishable characters recognized as decimal numerals 0, 1, 2, …,r – 1(= 9). Examples of the positional
and polynomial representations for the decimal system are
N
10
= (d
3
d
2
d
1
d
0
.d
–1
d
–2
d
–3
)
10
= 3017.528
and
where d
i
is the decimal digit in the ith position. Exclusive of possible leading and trailing zeros, the MSD and
LSD for this number are 3 and 8, respectively. This number could have been written in a form such as N
10
=
03017.52800 without altering its value but implying greater accuracy of the fraction portion.
Nar
ar ar ar ar
ar ar ar
ri
i
im
n
n
n
m
m
r
=
=
++++
++++
?
è
?
?
?
?
÷
÷
=-
-
-
-
-
-
-
-
-
-
?
1
1
1
2
2
1
1
0
0
1
1
2
2
L
L
Nd
i
i
i
n
10
3
1
3210123
10
3 10 0 10 1 10 7 10 5 10 2 10 8 10
3000 10 7 05 002 0008
=
=′+′+′+′+′ +′ +′
=+++++
=-
-
---
?
...
? 2000 by CRC Press LLC
Unsigned Binary Number System
Applying Eqs. (86.1) and (86.2) to the binary system requires that r = 2, indicating that there are two distin-
guishable characters, typically 0 and (r – 1) = 1, that are used. In positional representation these characters
(numbers) are called binary digits or bits. Examples of the positional and polynomial notations for a binary
number are
N
2
= (b
n–1
… b
3
b
2
b
1
b
0
. b
–1
b
–2
b
–3
… b
–m
)
2
= 101101.101
2
MSB LSB
and
where b
i
is the bit in the ith position. Thus, the bit positions are weighted …, 16, 8, 4, 2, 1, ?, ?, ?, … for
any number consisting of integer and fraction portions. Binary numbers so represented are sometimes referred
to as natural binary. In positional representation the bits on the extreme left and extreme right are called the
MSB (most significant bit) and LSB (least significant bit), respectively. Notice that by obtaining the value of a
binary number a conversion from binary to decimal has been performed. The subject of radix (base) conversion
will be dealt with more extensively later.
For reference purposes Table 86.1 provides the binary-to-decimal conversion for two-, three-, four-, five-,
and six-bit binary. The six-bit binary column is only halfway completed for brevity.
TABLE 86.1 Binary-to-Decimal Conversion
Two-Bit Decimal Three-Bit Decimal Four-Bit Decimal Five-Bit Decimal Six-Bit Decimal
Binary Value Binary Value Binary Value Binary Value Binary Value
00 0 000 0 0000 0 10000 16 100000 32
01 1 001 1 0001 1 10001 17 100001 33
10 2 010 2 0010 2 10010 18 100010 34
11 3 011 3 0011 3 10011 19 100011 35
100 4 0100 4 10100 20 100100 36
101 5 0101 5 10101 21 100101 37
110 6 0110 6 10110 22 100110 38
111 7 0111 7 10111 23 100111 39
1000 8 11000 24 101000 40
1001 9 11001 25 101001 41
1010 10 11010 26 101010 42
1011 11 11011 27 101011 43
1100 12 11100 28 101100 44
1101 13 11101 29 101101 45
1110 14 11110 30 101110 46
1111 15 11111 31 101111 47
..
..
Nb
i
i
im
n
=
=′+′+′+′+′+′+′+′+′
=+++++
=
=-
-
---
?
2
120212120212120212
32841050125
45625
1
543 101 3
10
..
.
? 2000 by CRC Press LLC
In the natural binary system the number of bits in a unit of data is commonly assigned a name. Examples are:
?4-data-bit unit: nibble (or half-byte)
?8-data-bit unit: byte
?16-data-bit unit: two bytes (or half-word)
?32-data-bit unit: word (or four bytes)
?64-data-bit unit: double-word
etc.
The word size for a computer is determined by the number of bits that can be manipulated and stored in
registers. The foregoing list of names would be applicable to a 32-bit computer.
Unsigned Binary-Coded Decimal, Hexadecimal,
and Octal Systems
While the binary system of numbers is most appropriate for use in computers, it has several disadvantages
when used by humans who have become accustomed to the decimal system. For example, binary machine code
is long, difficult to assimilate, and tedious to convert to decimal. However there exist simpler ways to represent
binary numbers for conversion to decimal representation. Three examples, commonly used, are natural binary-
coded decimal (NBCD), binary-coded hexadecimal (BCH), and binary-coded octal (BCO). These number
systems are useful in applications where a digital device, such as a computer, must interface with humans. The
NBCD code representation is also useful in carrying out computer arithmetic.
The NBCD Representation
The BCD system as used here is actually an 8, 4, 2, 1 weighted code called natural BCD or NBCD. This system
uses patterns of four bits to represent each decimal position of a number and is one of several such weighted
BCD code systems. The NBCD code is converted to its decimal equivalent by polynomials of the form
N
10
= b
3
′ 2
3
+ b
2
′ 2
2
+ b
1
′ 2
1
+ b
0
′ 2
0
= b
3
′ 8 + b
2
′ 4 + b
1
′ 2 + b
0
′ 1
for any b
3
b
2
b
1
b
0
code integer. Thus, decimal 6 is represented as (0′8) + (1′4) + (1′2) + (0′1), or 0110 in
NBCD code. Like natural binary, NBCD code is also called “natural” because its bit positional weights are
derived from integer powers of 2
n
. Table 86.2 shows the NBCD bit patterns for decimal integers 0 through 9.
The NBCD code is currently the most widely used of the BCD codes. There are many excellent sources of
information on BCD codes. One, in particular, provides a fairly extensive coverage of both weighted and
unweighted BCD codes [Tinder, 1991].
TABLE 86.2NBCD Bit Patterns and Decimal Equivalent
NBCD NBCD
Bit Pattern Decimal Bit Pattern Decimal
0000 0 1000 8
0001 1 1001 9
0010 2 1010 NA
0011 3 1011 NA
0100 4 1100 NA
0101 5 1101 NA
0110 6 1110 NA
0111 7 1111 NA
NA = not allowed.
? 2000 by CRC Press LLC
Decimal numbers greater than 9 or less than 1 can be represented by the NBCD code if each digit is given
in that code and if the results are combined. For example, the number 63.98 is represented by (or converted
to) NBCD code as
63.98
63.98
10
= 01100011.10011000)
NBCD
= 1100011.10011
NBCD
Here, the code weights are 80, 40, 20, 10; 8, 4, 2, 1; 0.8, 0.4, 0.2, 0.1; and 0.08, 0.04, 0.02, 0.01 for the tens,
units, tenths, and hundredths digits, respectively, representing four decades. Conversion between binary and
NBCD requires conversion to decimal as an intermediate step. For example, to convert from NBCD to binary
requires that groups of four bits be selected in both directions from the radix point to form the decimal number.
If necessary, zeros are added to the leftmost or rightmost ends to complete the groups of four bits as in the
above example. Negative NBCD numbers can be represented either in sign-magnitude notation or 1’s or 2’s
complement notation as discussed later.
Another BCD code that is used for number representation and manipulation is called excess 3 BCD (or XS3
NBCD, or simply XS3). XS3 is an example of a biased-weighted code (a bias of 3). This code is formed by
adding 0011
2
(= 3
10
) to the NBCD bit patterns in Table 86.2. Thus, to convert XS3 to NBCD code, 0011 must
be subtracted from XS3 code. In four-bit quantities the XS3 code has the useful feature that when adding two
numbers together in XS3 notation a carry will result and yield the correct value any time a carry results in
decimal (i.e., when 9 is exceeded). This feature is not shared by either natural binary or NBCD addition.
The Hexadecimal and Octal Systems
The hexadecimal number system requires that r = 16 in Eqs. (86.1) and (86.2), indicating that there are 16
distinguishable characters in the system. By convention, the permissible hexadecimal digits are 0, 1, 2, 3, 4, 5,
6, 7, 8, 9, A, B, C, D, E, and F for decimals 0 through 15, respectively. Examples of the positional and polynomial
representations for a hexadecimal number are
N
16
= (h
n–1
…h
3
h
2
h
1
h
0
. h
–1
h
–2
h
–3
…h
–m
)
16
= (AF3.C8)
16
with a decimal value of
Here, it is seen that a hexadecimal number has been converted to decimal by using Eq. (86.2).
The octal number system requires that r = 8 in Eqs. (86.1) and (86.2), indicating that there are eight
distinguishable characters in this system. The permissible octal digits are 0, 1, 2, 3, 4, 5, 6, and 7, as one might
expect. Examples of the application of Eqs. (86.1) and (86.2) are
N
8
= (o
n–1
…o
3
o
2
o
1
o
0
. o
–1
o
–2
o
–3
…o
–m
)
8
= 501.74
8
with a decimal value of
Nh
i
i
im
n
=
=′+′+′+′+′
=
=-
-
--
?
16
10 16 15 16 3 16 12 16 8 16
280378125
1
210 2
10
.
? 2000 by CRC Press LLC
When the hexadecimal and octal number systems are used to represent bit patterns in binary, they are called
binary-coded hexadecimal (BCH) and binary-coded octal (BCO), respectively. These two number systems are
examples of binary-derived radices. Table 86.3 lists several selected examples showing the relationships between
BCH, BCO, binary, and decimal.
What emerges on close inspection of Table 86.3 is that each hexadecimal digit corresponds to four binary
digits and that each octal digit corresponds to three binary digits. The following example illustrates the
relationships between these number systems:
5BF.D8
10110111111.11011
2
= 010110111111.11011000
= 5BF.D8
16
2677.66
= 010110111111.110110
= 2677.66
8
= 1471.84375
10
To separate the binary digits into groups of four (for BCH) or groups of three (for BCO), counting must
begin from the radix point and continue outward in both directions. Then, where needed, zeros are added to
the leading and trailing ends of the binary representation to complete the MSDs and LSDs for the BCH and
BCO forms.
Conversion between Number Systems
It is not the intent of this section to cover all methods for radix (base) conversion. Rather, the plan is to provide
general approaches, separately applicable to the integer and fraction portions, followed by specific examples.
Conversion of Integers
Since the polynomial form of Eq. (86.2) is a geometrical progression, the integer portion can be represented
in nested radix form. In source radix s, the nested representation is
TABLE 86.3The BCH and BCO Number Systems
Binary BCH BCO Decimal Binary BCH BCO Decimal
0000 0 0 0 1010 A 12 10
0001 1 1 1 1011 B 13 11
0010 2 2 2 1100 C 14 12
0011 3 3 3 1101 D 15 13
0100 4 4 4 1110 E 16 14
0101 5 5 5 1111 F 17 15
0110 6 6 6 10000 10 20 16
0111 7 7 7 11011 1B 33 27
1000 8 10 8 110001 31 61 49
1001 9 11 9 1001110 4E 116 78
No
i
i
im
n
=
=′+′+′+′+′
=
=-
-
--
?
8
5808187848
3219375
1
21012
10
.
? 2000 by CRC Press LLC
(86.3)
for digits a
i
having integer values from 0 to s – 1. The nested radix form not only suggests a conversion process
but also forms the basis for computerized conversion.
Consider that the number in Eq. (86.3) is to be represented in nested radix r form
(86.4)
where, in general, m 1 n. Then, if N
s
is divided by r, the results are of the form
(86.5)
where Q is the integer quotient rearranged as Q
0
= b
1
+ r(b
2
+
…
+ b
m–1
))))
r
and R is the remainder R
0
= b
0
.
A second division by r yields Q
0
/r = Q
1
+ R
1
/r, where Q
1
is arranged as Q
1
= b
2
+ r(b
3
+
…
+ b
m–1
)))
r
and R
1
=
b
1
. Thus, by repeated division of the integer result Q
i
by r, the remainders yield (b
0
, b
1
, b
2
, …, b
m–1
)
r
in that order.
The conversion method just described, called the radix divide method, can be used to convert between any
two integers of different radices. However, the requirement is that the arithmetic required by N
s
/r must be carried
out in source radix, s. Except for source radices 10 and 2, this poses a severe problem for humans. Table 86.4
provides the recommended procedures for integer conversion. The radix divide method is suitable for computer
conversion providing, of course, that the computer is programmed to carry out the arithmetic in different radices.
TABLE 86.4Summary of Recommended Methods for Integer Conversion by Noncomputer Means
Integer Conversion
Conversion Method
N
10
? N
r
Radix division by radix r using Eq. (86.5)
N
s
? N
10
Eq. (86.2) or Eq. (86.3)
N
s
)
s110
—> N
r
)
r110
N
s
? N
10
by Eq. (86.2) or (86.3)
N
10
? N
r
radix division by r using Eq. (86.5)
Special Cases for Binary Forms
N
2
? N
10
Positional weighting
N
2
? N
BCH
Partition N
2
into groups of four bits starting from radix point, then apply Table 86.3
N
2
? N
BCO
Partition N
2
into groups of three bits starting from radix point, then apply Table 86.3
N
BCH
? N
2
Reverse of N
2
? N
BCH
N
BCO
? N
2
Reverse of N
2
? N
BCO
N
BCH
? N
BCO
N
BCH
? N
2
? N
BCO
N
BCO
? N
BCH
N
BCO
? N
2
? N
BCH
N
NBCD
? N
XS3
Add 0011
2
(= 3
10
) to N
NBCD
N
XS3
? N
NBCD
Subtract 0011
2
(= 3
10
) from N
NBCD
Nasas asas
asasa a
asas
s
s
s
n
n
n
n
n
i
i
i
n
=++++
( )
=++++
=+
?
è
?
?
?
÷
-
-
-
-
-
-
=
-
?
1
1
2
2
1
1
0
0
012 1
0
1
1
1
L
L( ( )))))
N b rb rb b
brbr
rm
i
i
i
m
r
=++++
=+
?
è
?
?
?
÷
-
-
=
-
?
012 1
0
1
1
1
( ( )))))L
N
r
Q
R
r
s
=+
? 2000 by CRC Press LLC
The integer conversion methods of Table 86.4 can be illustrated by the following simple examples:
Example 1. 139
10
? N
2
N/r Q R
139/2 = 69 1
69/2 = 34 1
34/2 = 17 0
17/2 = 8 1
8/2 = 4 0
4/2 = 2 0
2/2 = 1 0
1/2 = 0 1 139
10
= 10001011
2
Example 2. 10001011
2
? N
10
. By positional weights,
N
10
= 128 + 8 + 2 + 1 = 139
10
Example 3. 139
10
? N
8
N/rQR
139/8 = 17 3
17/8 = 2 1
2/8 = 0 2 139
10
= 213
8
Example 4. 10001011
2
? N
BCO
213
010 001 011 = 213
BCO
Example 5. 213
BCO
? N
BCH
213 8 B
213
BCO
= 010 001 011 = 10001011
2
= 1000 1011 = 8B
16
Example 6. 213
8
? N
5
213
8
= 2 ′ 8
2
+ 1 ′ 8
1
+ 3 ′ 8
0
= 139
10
N/r Q R
139/5 = 27 4
27/5 = 5 2
5/5 = 1 0
1/5 = 0 1 213
8
= 1024
5
Check: 1 ′ 5
3
+ 2 ′ 5
1
+ 4 ′ 5
0
= 125 + 10 + 4 = 139
10
Conversion of Fractions
By extracting the fraction portion from Eq. (86.2) one can write
(86.6)
in radix s. This is called the nested inverse radix form that provides the basis for computerized conversion.
.( )
( ( )))))
()
Nasas as
sa sa a
sa as
sm
m
s
ms
i
i
s
i
m
=+++
=+++
=+
-
-
-
-
-
-
-
-
-
--
-
--
-+
=
?
1
1
2
2
1
1
1
2
1
1
1
2
L
L
? 2000 by CRC Press LLC
If the fraction in Eq. (86.6) is represented in nested inverse radix r form, then
(86.7)
for any fraction represented in radix r. Now, if N
s
is multiplied by r, the result is of the form
.N
s
′ r = I + F (86.8)
where I is the product integer, I
1
= b
–1
, and F
0
is the product fraction arranged as F
1
= r
–1
(b
–2
+ r
–1
(b
–3
+ … +
b
–p
))))
r
. By repeated multiplication by r of the remaining fractions F
i
, the resulting integers yield (b
–1
, b
–2
,
b
–3
, … b
–m
)
r
, in that order.
The conversion just described is called the radix multiply method and is perfectly general for converting
between fractions of different radices. However, as in the case of integer conversion, the requirement is that
the arithmetic required by .N
s
′ r must be carried out in source radix, s. For noncomputer use by humans, this
procedure is usually limited to fraction conversions N
10
? N
r
, where the source radix is 10 (decimal). The
recommended methods for converting between fractions of different radices are given in Table 86.5. The radix
multiply method is well suited to computer use.
For any integer of radix s, there exists an exact representation in radix r. This is not the case for a fraction
whose conversion is a geometrical progression that never converges. Terminating a fraction conversion at n
digits (to the right of the radix point) results in an error or uncertainty. In decimal, this error is given by
where the quantity in brackets approaches the value of a
–n
+ 1. Therefore, terminating a fraction conversion at
n digits from the radix point results in an error with bounds
TABLE 86.5Summary of Recommended Methods
for Fraction Conversion by Noncomputer Means
Fraction Conversion
Conversion Method
.N
10
? .N
r
Radix multiplication by using Eq. (86.8)
.N
s
? .N
10
Equation (86.2) or Eq. (86.6)
.N
r
)
s110
? .N
r
)
r110
N
s
? N
s10
by Eq. (86.2) or Eq. (86.6)
N
10
? N
r
radix multiply by Eq. (86.8)
Special Cases for Binary Forms
.N
2
? .N
BCH
Partition .N
2
into groups of four bits from
radix point, then apply Table 86.3
.N
2
? .N
BCO
Partition .N
2
into groups of three bits from
radix point, then apply Table 86.3
.N
BCH
? .N
2
Reverse of .N
2
? .N
BCH
.N
BCO
? .N
2
Reverse of .N
2
? .N
BCO
.N
BCH
? .N
BCO
.N
BCH
? .N
2
? .N
BCO
.N
BCO
? .N
BCH
.N
BCO
? .N
2
? .N
BCH
. ( ( )))))
()
Nrbrb b
rb br
r
p
r
i
i
r
i
p
=+++
=+
-- -- -
-
--
-+
=
?
11 12
1
1
1
2
L
e
10 1
1
2
2
1
=+ + +
=+
é
?
ê
ê
ù
?
ú
ú
-
-
-+()
-+()
-+()
-+()
-
--+()
-+()
=
¥
?
ar a r a r
ra ar
n
n
n
n
n
n
n
nni
ni
i
r
L
? 2000 by CRC Press LLC
0 < e
10
£ r
–n
(a
–n
+ 1) (86.9)
in decimal. Equation (86.9) is useful in deciding when to terminate a fraction conversion.
Often, it is desirable to terminate a fraction conversion at (n + 1) digits and then round off to n from the
radix point. A suitable method for rounding to n digits in radix r is: Perform the fraction conversion to (n + 1)
digits from the radix point, then drop the (n + 1) digit if a
–(n+1)
< r/2, or add r
–(n–1)
to the result if a
–(n+1)
3 r/2.
After rounding off to n digits, the maximum error becomes the difference between the rounded result and
the smallest value possible. By using Eq. (86.9), this difference is
e
max
= r
–n
(a
–n
+ 1) – r
–n
(a
–n
+ a
–(n+1)
/r)
= r
–n
(1 – a
–(n+1)
/r)
Then, by rounding to n digits, there results an error with bounds
0 < e
10
£ r
–n
(1 – a
–(n+1)
/r) (86.10)
in decimal. If a
–(n+1)
< r/2 and the (n + 1) digit is dropped, the maximum error is r
–n
. Note that for N
s
? N
10
?
N
r
type conversions, the bounds of errors aggregate.
The following examples illustrate the fraction conversion methods of Table 86.5.
Example 7. 0.654
10
? N
2
rounded to eight bits
.N
s
′ rFI
0.654 ′ 2 0.308 1
0.308 ′ 2 0.616 0
0.616 ′ 2 0.232 1
0.232 ′ 2 0.464 0
0.464 ′ 2 0.928 0
0.928 ′ 2 0.856 1 0.654
10
= 0.10100111
2
0.856 ′ 2 0.712 1
0.712 ′ 2 0.424 1
0.424 ′ 2 0.848 0 e
max
= 2
–8
Example 8. 0.654
10
? N
8
terminated at four digits
.N
s
′ rFI
0.654 ′ 8 0.232 5
0.232 ′ 8 0.856 1 0.654
10
= 5166
8
0.856 ′ 8 0.848 6 with error bounds
0.848 ′ 8 0.784 6 0 < e
10
£ 7 ′ 8
–4
= 1.71 ′ 10
–3
Example 9. 0.5166
8
? N
2
rounded to eight bits and let 0.5166
8
? N
10
be rounded to four decimal places.
0.5166
8
= 5 ′ 8
–1
+ 1 ′ 8
–2
+ 6 ′ 8
–3
+ 6 ′ 8
–4
= 0.625000 + 0.015625 + 0.011718 + 0.001465
= 0.6538 rounded to four decimal places; e
10
£ 10
–4
.N
s
′ rFI
0.6538 ′ 2 0.3076 1
0.3076 ′ 2 0.6152 0
0.6152 ′ 2 0.2304 1
0.2304 ′ 2 0.4608 0
0.4608 ′ 2 0.9216 0
? 2000 by CRC Press LLC
0.9216 ′ 2 0.8432 1
0.8432 ′ 2 0.6864 1 0.5166
8
= 0.10100111
2
(compare with Example 7)
0.6864 ′ 2 0.3728 1
0.3728 ′ 2 0.7457 0 e
10
£ 10
–4
+ 2
–8
= 0.0040
Example 10.0.10100111
2
? N
BCH
A· 7
0.10100111
2
= 0.1010 0111 = 0.A7
BCH
Signed Binary Numbers
To this point only unsigned numbers (assumed to be positive) have been considered. However, both positive
and negative numbers must be used in computers. Several schemes have been devised for dealing with negative
numbers in computers, but only four are commonly used:
?Signed-magnitude representation
?Radix complement representation
?Diminished radix complement representation
?Excess (offset) code representation
Of these, the radix 2 complement representation, called 2’s complement, is the most widely used system in
computers.
Signed-Magnitude Representation
A signed-magnitude number consists of a magnitude together with a symbol indicating its sign (positive or
negative). Such a number lies in the decimal range of –(r
n–1
– 1) through +(r
n–1
– 1) for n integer digits in
radix r. A fraction portion, if present, would consist of m digits to the right of the radix point.
The most common examples of signed-magnitude numbers are those in the decimal and binary systems.
The sign symbols for decimal (+ or –) are well known. In binary it is established practice to use 0 = plus and
1 = minus for the sign symbols and to place one of them in the MSB position for each number. Examples in
eight-bit binary are
Magnitude
+45.5
10
= 0 101101.1
2
+0
10
= 00000000
2
Sign bit
Magnitude
–123
10
= 1 1111011
2
–0
10
= 10000000
2
Sign bit
Although the sign-magnitude system is used in computers, it has two drawbacks. There is no unique zero,
as indicated by the examples, and addition and subtraction calculations require time-consuming decisions
regarding operation and sign as, for example, (–7) minus (–4). Even so, the sign-magnitude representation is
commonly used in floating-point number systems.
Radix Complement Representation
The radix complement of an n-digit number N
r
is obtained by subtracting it from r
n
, that is r
n
– N
r
. The
operation r
n
– N
r
is equivalent to complementing the number and adding 1 to the LSD. Thus, the radix
complement is N
r
+ 1
LSD
where N
r
= r
n
– 1 – N
r
is the complement of a number in radix r. Therefore, one may
write
? 2000 by CRC Press LLC
Radix complement of N
r
= r
n
– N
r
= N
r
+ 1 (86.11)
The complements N
r
for digits in three commonly used number systems are given in Table 86.6. Notice that
the complement of a binary number is formed simply by replacing the 1’s with 0’s and 0’s with 1’s as required
by 2
n
– 1 – N
2
.
With reference to Table 86.6 and Eq. (86.11), the following examples of radix complement representation
are offered.
Example 11.The 10’s complement of 47.83 is
N
10
+ 1
LSD
= 52.17
Example 12.The 2’s complement of 0101101.101 is
N
2
+ 1
LSB
= 1010010.011
Example 13.The 16’s complement of A3D is
N
16
+ 1
LSD
= 5C2 + 1 = 5C3
The decimal value of Eq. (86.11) can be found from the polynomial expression
(86.12)
for any n-digit number of radix r. In Eqs. (86.11) and (86.12) the MSD is taken to be the position of the sign
symbol.
2’s Complement Representation.The radix complement for binary is the 2’s complement representation. In
2’s complement the MSB is the sign bit, 1 indicating a negative number or 0 if positive. The decimal range of
representation for n-integer bits in 2’s complement is from –(2
n–1
) through +(2
n–1
). From Eq. (86.11), the 2’s
complement is formed by
N
2
)
2’s compl.
= 2
n
– N
2
= N
2
+ 1 (86.13)
A few examples in eight-bit binary are shown in Table 86.7. Notice that application of Eq. (86.13) changes
the sign of the decimal value of a binary number (+ to –, and vice versa) and that only one zero representation
exists.
Application of Eq. (86.12) gives the decimal value of any 2’s complement number, including those containing
a radix point. For example, the pattern N
2’s compl.
= 11010010.011 has a decimal value
N
2’s compl.
)
10
= –1 ′ 2
7
+ 1 ′ 2
6
+ 1 ′ 2
4
+ 1 ′ 2
1
+ 1 ′ 2
–2
+ 1 ′ 2
–3
= –128 + 64 + 16 + 2 + 0.25 + 0.125
= –45.625
10
The same result could have easily been obtained by first applying Eq. (86.13) to N
2’s compl.
followed by the use
of positional weighting to obtain the decimal value. Thus,
Narar
n
n
i
i
im
n
radix compl.
)()
10 1
1
2
=- +
-
-
=-
-
?
? 2000 by CRC Press LLC
N
2’s compl.
= 00101101.101
= 32 + 8 + 5 + 0.5 + 0.125
= 45.625
10
which is known to be a negative number, –45.625
10
.
Negative NBCD numbers can be represented in 2’s complement. The foregoing discussion on 2’s complement
applies to NBCD with consideration of how NBCD is formed from binary. As an example, –59.24
10
is represented
by
0101 1001.0010 0100)
NBCD
= 10100110.11011100)
2’s compl. NBCD
In a similar fashion, negative NBCD numbers can also be represented in 1’s complement following the procedure
given in the next paragraph. Sign-magnitude representation of a negative NBCD number simply requires the
addition of a sign bit to the NBCD magnitude.
Diminished Radix Complement Representation
The diminished radix complement of a number is obtained by
N
r
)
dim. rad. compl.
= r
n
– N
r
– 1 (86.14)
= N
r
Thus, the complement of a number is its diminished radix complement. It also follows that the radix comple-
ment of a number is the diminished radix complement with 1 added to the LSD as in Eq. (86.13). The range
of representable numbers is –(r
n–1
– 1) through +(r
n–1
– 1) for radix r.
In the binary and decimal number systems, the diminished radix complement representations are the 1’s
complement and 9’s complement, respectively. Examples of 1’s complement are shown in Table 86.7 for
comparison with those of 2’s complement. Notice that in 1’s complement there are two representations for
zero, one for +0 and the other for –0. This fact limits the usefulness of the 1’s complement representation for
computer arithmetic.
TABLE 86.6 Complements for Three
Commonly Used Number Systems
TABLE 86.7 Examples of Eight-Bit 2’s and 1’s
Complement Representations (MSB = Sign Bit)
Complement (–N
r
) Decimal 2’s 1’s
Digit Binary Decimal Hexadecimal Value Complement Complement
0 1 9 F –128 10000000
1 0 8 E –127 10000001 10000000
2 7 D –31 11100001 11100000
3 6 C –16 11110000 11101111
4 5 B –15 11110001 11110000
5 4 A –3 11111101 11111100
6 3 9 –0 00000000 11111111
7 2 8 +0 00000000 00000000
8 1 7 +3 00000011 00000011
9 0 6 +15 00001111 00001111
A 5 +16 00010000 00010000
B 4 +31 00011111 00011111
C 3 +127 01111111 01111111
D 2 +128
E1
F0
? 2000 by CRC Press LLC
Excess (Offset) Representations
Other systems for representing negative numbers use excess or offset codes. Here, a bias B is added to the true
value N
r
of the number to produce an excess number N
xs
given by
N
xs
= N
r
+ B (86.15)
When B = r
n–1
exceeds the usable bounds of negative numbers, N
xs
remains positive. Perhaps the most common
use of the excess representation is in floating-point number systems—the subject of the next section.
Two examples are given below in eight-bit excess 128 code.
Example 14.
–43
10
11010101 N
2’s compl.
+128
10
10000000 B
85
10
01010101 N
xs
= –43
10
in excess 128 code
Example 15.
27
10
00011011 N
2’s compl.
+128
10
10000000 B
155
10
10011011 N
xs
= 27
10
in excess 128 code
The representable decimal range for an excess 2
n–1
number system is –2
n–1
through +(2
n–1
– 1) for an n-bit
binary number. However, if N
2
+ B > 2
n–1
– 1, overflow occurs and 2
n–1
must be subtracted from (N
2
+ B) to
give the correct result in excess 2
n–1
code.
Floating-Point Number Systems
In fixed-point representation [Eq. (86.1)], the radix point is assumed to lie immediately to the right of the
integer field and at the left end of the fraction field. The fixed-point system is the most commonly used system
for representing bounded orders of magnitude. For example, with 32 bits a binary number could represent
decimal numbers with upper and lower bounds of the order of ±10
10
and ±10
–10
. However, for greatly expanded
bounds of representation, as in scientific notation, the floating-point representation is needed.
A floating-point number (FPN) in radix r has the general form
FPN)
r
= F ′ r
E
(86.16)
where F is the fraction (or mantissa) and E is the exponent. Only fraction digits are used for the mantissa! Take,
for example, Planck’s constant h = 6.625 ′ 10
–34
J·s. This number can be represented many different ways in
floating point notation:
Planck’s constant h = 0.625 2 10
–33
= 0.0625 2 10
–32
= 0.00625 2 10
–31
All three adhere to the form of Eq. (86.16) and are, therefore, legitimate floating-point numbers in radix 10.
Thus, as the radix point floats to the left, the exponent is scaled accordingly. The first form for h is said to be
normalized because the MSD of F is nonzero, a means of standardizing the radix point position. Notice that
the sign for F is positive while that for E is negative.
In computers the FPN is represented in binary where the normalized representation requires that the MSB
for F always be 1. Thus, the range in F in decimal is
? 2000 by CRC Press LLC
0.5 £ F < 1
Also, the mantissa F is represented in sign-magnitude from. The normalized format for a 32-bit floating-point
number in binary, which agrees with the IEEE standard [IEEE, 1985], is shown in Fig. 86.1. Here, the sign bit
(1 if negative or 0 if positive) is placed at bit position 0 to indicate the sign of the fraction. Notice that the
radix point is assumed to lie between bit positions 8 and 9 to separate the E bit-field from the F bit-field.
Before two FPNs can be added or subtracted in a computer, the E fields must be compared and equalized
and the F fields adjusted. The decision-making process can be simplified if all exponents are converted to
positive numbers by using the excess representation given by Eq. (86.15). For a q-digit number in radix r, the
exponent in Eq. (86.16) becomes
E
xs
= E
r
+ r
q–1
(86.17)
where E is the actual exponent augmented by a bias of B = r
q–1
. The range in the actual exponent E
r
is usually
taken to be
–(r
q–1
– 1) £ E
r
£ +(r
q–1
– 1)
In the binary system, required for computer calculations, Eq. (86.17) becomes
E
xs
= E
2
+ 2
q–1
(86.18)
with a range in actual exponent of –(2
q–1
–1) £ E
2
£ +(2
q–1
– 1). In 32-bit normalized floating-point form, the
exponent is stored in excess 128 code, while the number is stored in sign-magnitude form.
There still remains the question of how the number 0 is to be represented. If the F field is zero, then the
exponent can be anything and the number will be zero. However, in computers the normalized FPN
2
limits F
to 0.5 £ F < 1 since the MSB for F is always 1. The solution to this problem is to assume that the number is
zero if the exponent bits are all zero regardless of the value of the mantissa. This leads, however, to a discontinuity
in normalized FPN
2
representation at the low end.
The IEEE standard for normalized FPN
2
representation attempts to remove the problem just described. The
IEEE system stores the exponent in excess 2
q–1
– 1 code and limits the decimal range of the actual exponent to
–(2
q–1
– 2) £ E
2
£ +(2
q–1
– 1)
For 32-bit FPN representation, the exponent is stored in excess 127 code as indicated in Fig. 86.1. Thus, the
allowable range of representable exponents is from
–126
10
= 00000001
2
through +127
10
= 11111110
2
FIGURE 86.1IEEE standard bit format for normalized floating-point representation.
? 2000 by CRC Press LLC
This system reserves the use of all 0’s or all 1’s in the exponent for special conditions [IEEE, 1985; Pollard,
1990]. So that the F field magnitude can diminish linearly to zero when E = –126, the MSB = 1 for F is not
specifically represented in the IEEE system but is implied.
The following example attempts to illustrate the somewhat confusing aspects of the IEEE normalized
representation:
The number 101101.11001
2
is to be represented in IEEE normalized FPN
2
notation.
101101.11001
2
= .10110111001 ′ 2
6
Sign bit = 0 (positive)
E
xs
= 6 + 127 = 133
10
= 10000101
2
F = 0110111001…00 (the MSB = 1 is not shown)
Therefore, the IEEE normalized FPN is
FPN
2
= 0 10000101 0110111001…0
Still other forms of FPNs are in use. In addition to the IEEE system, there are the IBM, Cray, and DEC
systems of representation, each with its own single- and double-precision forms.
Defining Terms
Binary: Representation of quantities in base 2.
Complement: Opposite form of a number system.
Floating point: Similar to “scientific notation” except used to represent binary operations in a computer.
Hexadecimal: Base 16 number system.
Mantissa: Fraction portion of a floating-point number.
Octal: Base 8 number system.
Radix: Base to which numbers are represented.
Related Topic
86.2 Computer Arithmetic
References
H.L. Garner, “Number systems and arithmetic,” in Advances in Computers, vol. 6, F.L. Alt et al., Eds., New York:
Academic, 1965, pp. 131–194.
IEEE, IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std. 754–1985.
D.E. Knuth, The Art of Computer Programming: Seminumerical Algorithms, vol. 2, Reading, Mass: Addison-
Wesley, 1969.
C. Tung, “Arithmetic,” in Computer Science, A.F. Cardenas et al., Eds., New York: Wiley-Interscience, 1972, chap. 3.
Further Information
K. Hwang, Computer Arithmetic, New York: Wiley, 1978.
L.H. Pollard, Computer Design and Architecture, Englewood Cliffs, N.J.: Prentice-Hall, 1990.
R.F. Tinder, Digital Engineering Design: A Modern Approach, Englewood Cliffs, N.J.: Prentice-Hall, 1991.
? 2000 by CRC Press LLC
86.2 Computer Arithmetic
Vojin G. Oklobdzija
As the ability to perform computation increased from the early days of computers up to the present, so has
the knowledge of how to utilize the hardware and software to perform computation. Digital computer arithmetic
emerged from that period in two ways: as an aspect of logic design and as a development of efficient algorithms
to utilize the available hardware.
Given that numbers in a digital computer are represented as a string of zeros and ones and that hardware
can perform only a relatively simple and primitive set of Boolean operations, all the arithmetic operations
performed are based on a hierarchy of operations that are built upon the very simple ones.
What distinguishes computer arithmetic is its intrinsic relation to technology and the ways things are designed
and implemented in a digital computer. This comes from the fact that the value of a particular way to compute,
or a particular algorithm, is directly evaluated from the actual speed with which this computation is performed.
Therefore, there is a very direct and strong relationship between the technology in which digital logic is
implemented to compute and the way the computation is structured. This relationship is one of the guiding
principles in the development of computer arithmetic.
The subject of computer arithmetic can be, for simpler treatment, divided into number representation; basic,
arithmetic operations (such as addition, multiplication, and division), and evaluation of functions.
Number Representation
The only way to represent information in a digital computer is via a string of bits, i.e., zeros and ones. The
number of bits being used depends on the length of the computer word, which is a quantity of bits on which
hardware is capable of operating (sometimes also a quantity that is brought to the CPU from memory in a
single access). The first question is what relationship to use in establishing correspondence between those bits
and a number. Second, we need to make sure that certain properties that exist in the corresponding number
representation system are satisfied and that they directly correspond to the operations being performed in
hardware over the taken string of bits.
This relationship is defined by the rule that associates one numerical value designated as X (in the text we
will use capital X for the numerical value) with the corresponding bit string designated as x.
x = {x
n–1
, x
n–2
,…, x
0
}
where
x
i
? 0, 1
In this case the associated word (the string of bits) is n bits long.
When for every value X there exists one and only one corresponding bit string x, we define the number
system as nonredundant. If however, we could have more than one bit string x that represents the same value
X, the number system is redundant.
Most commonly we are using numbers represented in a weighted number system, where a numerical value
is associated with the bit string x according to the equation
where
w
0
= 1 and w
i
= (w
i
– 1)(r
i
– 1)
xxw
ii
i
n
=
=
-
?
0
1
? 2000 by CRC Press LLC
The value r
i
is an integer designated as the radix, and in a nonredundant number system it is an integer equal
to the number of allowed values for x
i
. In general x
i
could consist of more than one bit. The numerical value
associated with x is designated as the explicit value of x
e
. In conventional number systems the radix r
i
is the
same positive integer for all the digit positions x
i
and with the canonical set of digit values
?i = {0, 1, 2, 3,…, r
i
– 1} for 0 £ i £ n – 1
An example of a weighted number system with a mixed radix would be the representation of time in weeks,
days, hours, minutes, and seconds with a range for representing 100 weeks:
r = 10, 10, 7, 24, 60, 60
In digital computers the radices encountered are 2, 4, 10, and 16, with 2 being the most commonly used one.
The digit set x
i
can be redundant and nonredundant. If the number of different values x
i
can assume is n
x
£
r, then we have a nonredundant digit set. Otherwise, if n
x
> r, we have a redundant digit set. Use of the redundant
digit set has its advantages in efficient implementation of algorithms (multiplication and division in particular).
Other number representations of interest are nonweighted number systems, where the relative position of
the digit does not affect the weight so that the appropriate interchange of any two digits will not change the
value x. The best example of such a number system is the residue number system (RNS).
We also define explicit value x
e
and implicit value X
i
of a number represented by a bit string x. The implicit
value is the only value of interest to the user, while the explicit value provides the most direct interpretation of
the bit string x. Mapping of the explicit value to the implicit value is obtained by an arithmetic function that
defines the number representation used. It is a task of the arithmetic designer to devise algorithms that result
in the correct implicit value of the result for the operations on the operand digits representing the explicit
values. In other words, the arithmetic algorithm needs to satisfy the closure property.
The relationship between the implicit value and the explicit value is best illustrated by Table 86.8.
Representation of Signed Integers
The two most common representations of signed integers are sign and magnitude (SM) representation and
true and complement (TC) representation. While SM representation might be easier to understand and convert
to and from, it has its own problems. Therefore, we will find TC representation to be more commonly used.
Sign and Magnitude Representation (SM).In SM representation signed integer X
i
is represented by sign bit
x
s
and magnitude x
m
(x
s
, x
m
). Usually 0 represents the positive sign (+) and 1 represents the negative sign (–).
The magnitude of the number x
m
can be represented in any way chosen for the representation of positive
integers. A disadvantage of SM representation is that two representations of zero exist, positive and negative
zero: x
s
= 0, x
m
= 0 and x
s
= 1, x
m
= 0.
True and Complement Representation (TC). In TC representation there is no separate bit used to represent
the sign. Mapping between the explicit and implicit value is defined as
TABLE 86.8The Relationship between the Implicit Value and the Explicit Value
Implied Attributes: Radix Point, Expression for Implicit Value X
i
Numerical Implicit Value X
i
Negative Number Representation, Others as a Function of Explicit Value x
e
(in Decimal)
Integer magnitude X
i
= x
e
27
Integer, two’s complement X
i
= –2
5
+ x
e
–5
Integer, one’s complement X
i
= –(2
5
– 1) + x
e
–4
Fraction, magnitude X
i
= –2
–5
x
e
27/32
Fraction, two’s complement X
i
= –2
–4
(2
–5
+ x
e
) –5/16
Fraction, one’s complement X
i
= –2
–4
(2
–5
+ 1 + x
e
) –4/16
Source: A. Avizienis, “Digital computer arithmetic: A unified algorithmic specification,” in Symp. Computers
and Automata, Polytechnic Institute of Brooklyn, April 13–15, 1971.
? 2000 by CRC Press LLC
The illustration of TC mapping is given in Table 86.9. In this representation positive integers are represented
in the true form, while negative integers are represented in the complement form.
With respect to how the complementation constant C is chosen, we can further distinguish two representa-
tions within the TC system. If the complementation constant is chosen to be equal to the range of possible
values taken by x
e
, C = r
n
in a conventional number system where 0 £ x
e
£ r
n
– 1, then we have defined the
range complement (RC) system. If, on the other hand, the complementation constant is chosen to be C = r
n
– 1,
we have defined the diminished radix complement (DRC) (also known as the digit complement [DC]) number
system. Representations of the RC and DRC number representation systems are shown in Table 86.10.
As can be seen from Table 86.10, the RC system provides for one unique representation of zero because the
complementation constant C = r
n
falls outside the range. There are two representations of zero in the DRC
system, x
e
= 0 and r
n
– 1. The RC representation is not symmetrical, and it is not a closed system under the
change of sign operation. The range for RC is [–1?2r
n
, 1?2r
n
– 1]. The DC is symmetrical and has the range of
[–(1?2r
n
– 1), 1?2r
n
– 1].
For the radix r = 2, RC and DRC number representations are commonly known as two’s complement and
one’s complement number representation systems, respectively. Those two representations are illustrated by an
example in Table 86.11 for the range of values –(4 £ X
i
£ 3).
Algorithms for Basic Arithmetic Operations
The algorithms for the arithmetic operation are dependent on the number representation system used. There-
fore, their implementation should be examined for each number representation system separately, given that
the complexity of the algorithm, as well as its hardware implementation, is dependent on it.
Addition and Subtraction in Sign and Magnitude System
In the SM number system addition/subtraction is performed on pairs (u
s
, u
m
) and (w
s
, w
m
) resulting in a sum
(s
s
, s
m
), where u
s
and w
s
are sign bits and u
m
and w
m
are magnitudes. The algorithm is relatively complex because
it requires comparisons of the signs and magnitudes as well. Extending the addition algorithm in order to
perform subtraction is relatively easy because it only involves change of the sign of the operand being subtracted.
Therefore, we will consider only the addition algorithm.
The algorithm can be described as
if u
s
= w
s
(signs are equal) then
s
s
= u
s
and s
m
= u
m
+ w
m
(operation includes checking for the overflow)
X
xxC
xC xC
i
ee
=
<
>
ì
í
?
/
–/
2
2
TABLE 86.10Mapping of the Explicit Value x
e
into RC and DRC Number Representations
x
e
X
i
(RC) X
i
(DRC)
000
111
222
MMM
1?2r
n
– 1 1?2r
n
– 1 1?2r
n
– 1
1?2r
n
–1?2r
n
–(1?2r
n
– 1)
MMM
r
n
– 2 –2 –1
r
n
– 1 –1 0
TABLE 86.9True and Complement Mapping
x
e
X
i
00
11
22
MM
C/2 –1 C/2 –1
C/2 +1 –(C/2 +1)
MM
C –2 –2
C –1 –1
C0
? 2000 by CRC Press LLC
if u
s
1 w
s
then
if u
m
> w
m
: s
m
= u
m
– w
m
s
s
= u
s
else: s
m
= w
m
– u
m
s
s
= w
s
Addition and Subtraction in True and Complement System
Addition in the TC system is relatively simple. It is sufficient to perform modulo addition of the explicit values;
therefore,
s
e
= (u
e
+ w
e
) mod C
Proof will be omitted.
In the RC number system this is equivalent to passing the operands through an adder and discarding the
carry-out of the most significant position of the adder which is equivalent to performing the modulo addition
(given that C = r
n
).
In the DRC number system the complementation constant is C = r
n
– 1. Modulo addition in this case is
performed by subtracting r
n
and adding 1. It turns out that this operation can be performed by simply passing
the operands through an adder and feeding the carry-out from the most significant digit position into the
carry-in at the least significant digit position. This is also called addition with end-around carry.
Subtracting two numbers is performed by simply changing the sign of the operand to be subtracted preceding
the addition operation.
Change of Sign Operation. The change of sign operation involves the following operation:
W
i
= –Z
i
w
e
= (–z
e
) = (–z
e
) mod C = C – Z
i
mod C = C – z
e
which means that the change of sign operation consists of subtracting the operand z
e
from the complementation
constant C.
In the DRC system complementation is performed by simply complementing each digit of the operand Z
i
with respect to r – 1. In the case of r = 2, this results in a simple inversion of bits.
In the RC system the complementation is performed by complementing each digit of the operand Z
i
with
respect to r – 1 and adding 1 to the resulting z
e
.
Implementation of Addition
Carry Look-Ahead Adder (CLA)
The first significant speed improvement in the implementation of a parallel adder was a carry-look-ahead adder
(CLA) developed by Weinberger and Smith [1958] in 1958. The CLA is one of the fastest schemes used for the
addition of two numbers even today given that the delay incurred to add two numbers is logarithmically
TABLE 86.11 Two’s Complement and One’s Complement Representation
Two’s Complement, C = 8 One’s Complement, C = 7
X
i
x
e
X
i
(2’s complement) x
e
X
i
(1’s complement)
3 3 011 3 011
2 2 010 2 010
1 1 001 1 001
0 0 000 0 000
–0 0 000 7 111
–1 7 111 6 110
–2 6 110 5 101
–3 5 101 4 100
–4 4 100 3 —
? 2000 by CRC Press LLC
dependent on the size of the operands (delay = log[N]). The concept of CLA is illustrated in Fig. 86.2a and b.
For each bit position of the adder, a pair of signals (p
i
, g
i
) is generated in parallel. It is possible to generate
local carries using (p
i
, g
i
) as seen in the equations. Those signals are designated as: p
i
= carry-propagate and
g
i
= carry-generate because they take part in propagation and generation of carry signal C
i-1
. However, each
bit position requires an incoming carry signal C
i-1
in order to generate the outgoing carry C
i
. This makes the
addition slow because the carry signal has to ripple from stage to stage as shown in Fig. 86.2a. The adder can
be divided into the groups and the carry-generate and carry-propagate signals can be calculated for the entire
group (G,P). This will take an additional time equivalent to and AND-OR delay of the logic. However, now
we can calculate each group’s carry signals in an additional AND-OR delay. for the generation of the carry
signal from the adder only the incoming carry signal into the group is now required. Therefore, the rippling
of the carry is limited only to the groups. In the next step we may calculate generate and propagate signals for
the group of groups (G*, P*) and continue in that fashion until we have only one group left generating the
C
out
signal from the adder. This process will terminate in log number of steps given that we are generating a
tree structure for generation of carries. The computation of carries within the groups is done individually as
illustrated in Fig. 86.2a and this process requires only the incoming carry into the group.
The logarithmic dependence on the delay (delay = log[N]) is only valid under the assumption that the gate
delay is constant without depending on the fan-out and fan-in of the gate. In practice this is not true and even
when the bipolar technology (which does not exhibit strong dependence on the fan-out) is used to implement
CLA structure, the further expansion of the carry-block is not possible given the practical limitations on the
fan-in of the gate.
In CMOS technology this situation is much different given that CMOS gate has strong dependency not only
on fan-in but on fan-out as well. This limitation takes away much of the advantages gained by using the CLA
scheme. However, by clever optimization of the critical path and appropriate use of dynamic logic the CLA
scheme can still be advantageous, especially for the adders of a larger size.
FIGURE 86.2Carry Look-Ahead adder structure. (a) Generation of carry generate and propagate signals and (b) generation
of group signals G, P and intermediate carries.
? 2000 by CRC Press LLC
Conditional-Sum Addition
Another one of the fast schemes for addition of two numbers that predates CLA is conditional-sum addition
(CSA) proposed by Sklansky [1960] in 1960. The essence of the CSA scheme is the realization that we can add
two numbers without waiting for the carry signal to be available. Simply, the number were added in two
instances: one assuming C
in
= 0 and the other assuming C
in
= 1. The results: Sum
0
, Sum
1
and Carry
0
, Carry
1
are presented at the input of a multiplexer. The final values are being selected at the time C
in
arrives at the
“select” input of a multiplexer. As in CLA the input bits are divided into groups which are added “conditionally”.
It is apparent that starting from the least significant bit (LSB) position the hardware complexity starts to
grow rapidly. Therefore, in practice, the full-blown implementation of the CSA is not often seen.
However, the idea of adding the most significant bit (MSB) portion conditionally and selecting the results
once the carry-in signal is computed in the LSB portion is attractive. Such a scheme (which is a subset of CSA)
is known as “carry-select adder”. A 26-b carry-select adder consisting of two 13-bit portions is shown in Fig. 86.3.
Multiplication Algorithm
The multiplication operation is performed in a variety of forms in hardware and software. In the beginning of
computer development any complex operation was usually programmed in software or coded in the microcode
of the machine. Some limited hardware assistance was provided. Today it is more likely to find full hardware
implementation of the multiplication for reasons of speed and reduced cost of hardware. However, in all of
them, multiplication shares the basic algorithm with some adaptations and modifications to the particular
implementation and number system used. For simplicity we will describe a basic multiplication algorithm that
operates on positive n-bit-long integers X and Y resulting in the product P, which is 2n bits long:
This expression indicates that the multiplication process is performed by summing n terms of a partial product:
X ′ y
i
r
i
. This product indicates that the ith term is obtained by a simple arithmetic left shift of X for the i
positions and multiplication by the single digit y
i
. For the binary radix r = 2, y
i
is 0 or 1 and multiplication by
the digit y
i
is very simple to perform. The addition of n terms can be performed at once, by passing the partial
products through a network of adders (which is the case of full hardware multiplier), or sequentially, by passing
the partial product through an adder n times. The algorithm to perform multiplication of X and Y can be
described as [Ercegovac, 1985]
p
(0)
= 0
p
j + 1
= 1/r(p
j
+ r
n
Xy
j
) for j = 0,…, n – 1
It can be easily proved that this recurrence results in p
(n)
= XY.
FIGURE 86.326-bit carry-select adder.
PXYX yr Xyr
i
i
i
n
i
i
i
n
==′ =′
=
-
=
-
??
0
1
0
1
? 2000 by CRC Press LLC
Various modifications of the multiplication algorithm exist; one of the most famous is the modified Booth
recoding algorithm described by Booth in 1951. This algorithm allows for the reduction of the number of partial
products, thus speeding up the multiplication process. Generally speaking, the Booth algorithm is a case of
using the redundant number system with the radix higher than 2.
Implementation of the Multiplication Algorithm
Speed of multiply operation is of utmost importance in digital signal processors (DSP) today as well as in the
general purpose processors. Therefore, research in building a fast parallel multiplier has been going on since
such a paper was published by Wallace [1964] in 1964. In his historic paper, Wallace introduced a way of
summing the partial product bits in parallel using a tree of carry-save adders which became generally known
as the “Wallace Tree” (Fig. 86.4).
A suggestion for speed improvement of such a process of adding partial product bits in parallel followed in
the paper published by Dadda [1965]. In his 1965 paper, Dadda introduced a notion of a counter structure
that will take a number of bits p in the same bit position (of the same “weight”) and output a number q that
represents the count of ones in the input. Dadda has introduced a number of ways to compress the partial
product bits using such a counter, which later became known as Dadda’s counter.
The quest for making the parallel multiplier even faster continued for almost 30 years. The search for
producing a fastest “counter” did not result in a general structure that yielded a faster partial product summation
than that which used a full-adder (FA) cell, or 3:2 counter. Therefore, the use of the Wallace Tree was almost
prevalent in the implementation of the parallel multipliers. In 1981, Weinberger disclosed a structure he called
“4-2 carry-save module”. This structure contained a combination of FA cells in an intricate interconnection
structure that was yielding faster partial product compression than the use of 3:2 counters.
FIGURE 86.4Wallace Tree.
? 2000 by CRC Press LLC
The 4:2 compressor (Fig. 86.5) actually compresses five partial product bits into three; however, it is connected
in such a way that four of the inputs are coming from the same bit position of the weight j while one bit is fed
from the neighboring position j-1 (known as carry-in). The output of such a 4-2 module consists of one bit
in the position j and two bits in the position j+1. This structure does not represent a counter (though it became
erroneously known as a “4-2 counter”) but a “compressor” which would compress four partial product bits
into two (while using one bit laterally connected between adjacent 4-2 compressors). The efficiency of such a
structure is higher (it reduces the number of partial product bits by one half). The speed of such a 4-2
compressor has been determined by the speed of 3 XOR gates in series (in the redesigned version of 4-2
compressor) making such a scheme more efficient that the one using 3:2 counters in a regular Wallace Tree.
The other equally important feature of the use of a 4-2 compressor is that the interconnections between such
cells follow more regular patterns than in the case of the Wallace Tree.
Booth Encoding
Booth’s algorithm [Booth, 1951] is widely used in the implementations of hardware or software multipliers
because its application makes it possible to reduce the number of partial products. It can be used for both sign-
magnitude numbers as well as 2’s complement numbers with no need for a correction term or a correction step.
A modification of the Booth algorithm was proposed by MacSorley
[1961] in which a triplet of bits is scanned instead of two bits. This
technique has the advantage of reducing the number of partial prod-
ucts by one half regardless of the inputs. This is summarized in
Table 86.12.
The recoding is performed within two steps: encoding and selec-
tion. The purpose of the encoding is to scan the triplet of bits of the
multiplier and define the operation to be performed on the multi-
plicand, as shown in Table 86.8. This method is actually an applica-
tion of a sign-digit representation in radix 4. The Booth-MacSorley
algorithm, usually called the Modified Booth algorithm or simply the
Booth algorithm, can be generalized to any radix.
Booth recoding necessitates the internal use of 2’s complement representation in order to efficiently perform
subtraction of the partial products as well as additions. However, floating point standard specifies sign mag-
nitude representation which is followed by most of the non-standard floating point numbers in use today. The
advantage of Booth recoding is that it generates only a half of the partial products as compared to the multiplier
implementation which does not use Booth recoding. However, the benefit achieved comes at the expense of
FIGURE 86.54:2 compressor.
TABLE 86.12Modified Booth Recoding
x
i+2
x
i+1
x
i
Add to Partial Product
000 +0Y
001 +1Y
010 +1Y
011 +2Y
100 –2Y
101 –1Y
110 –1Y
111 –0Y
? 2000 by CRC Press LLC
increased hardware complexity. Indeed, this implementation requires hardware for the encoding and for the
selection of the partial products (0, ±Y, ±2Y). An optimized encoding is shown in Fig. 86.6.
Division Algorithm
Division is a more complex process to implement because, unlike multiplication, it involves guessing the digits
of the quotient. Here, we will consider an algorithm for division of two positive integers designated as dividend
Y and divisor X resulting in a quotient Q and an integer remainder Z according to the relation given by
Y = XQ + Z
In this case the dividend contains 2n integers and the divisor has n digits in order to produce a quotient with
n digits.
The algorithm for division is given with the following recurrence relationship [Ercegovac, 1985]:
z
(0)
= Y
z
(j+1)
= rz
(j)
– Xr
n
Q
n–1–j
for j = 0,…, n – 1
this recurrence relation yields
z
(n)
= r
n
(Y – XQ)
Y = XQ + z
(n)
r
–n
which defines the division process with remainder Z = z
(n)
r
–n
.
The selection of the quotient digit is done by satisfying that 0 £ Z < X at each step in the division process.
This selection is a crucial part of the algorithm, and the best known are restoring and nonrestoring division
algorithms. In the former the value of the tentative partial remainder z
(j)
is restored after the wrong guess is
made of the quotient digit q
j
. In the latter this correction is not done in a separate step, but rather in the step
following. The best-known division algorithm is the so-called SRT algorithm independently developed by
Sweeney, Robertson, and Tocher. Algorithms for a higher radix were further developed by Robertson and his
students, most notably Ercegovac.
Defining Terms
4:2 Compressor:A structure used in the partial product reduction tree of a parallel multiplier for achieving
faster and more efficient reduction of the partial product bits.
Algorithm:Decomposition of the computation into subcomputations with an associated precedence relation
that determines the order in which these subcomputations are performed [Ercegovac, 1985].
FIGURE 86.6Booth encoder.
? 2000 by CRC Press LLC
Booth-MacSorley algorithm: An algorithm used for recoding of the multiplier such that the number of
partial products is roughly reduced by a factor of two. It is a special case of the application of the redundant
number system to represent the multiplier.
Carry look-ahead adder: An implementation technique of addition that accelerates the propagation of the
carry signal, thus increasing the speed of addition operation.
Dadda’s counter: A generalized structure used to produce a number (count) representing the number of bits
that are “one”. It is used for efficient reduction of the partial products in a parallel multiplier.
Explicit value x
e
: A value associated with the bit string according to the rule defined by the number repre-
sentation system being used.
Implicit value X
i
: The value obtained by applying the arithmetic function defined for the interpretation of
the explicit value x
e
.
Nonredundant number system: The system where for each bit string there is one and only one corresponding
numerical value x
e
.
Number representation system: A defined rule that associates one numerical value x
e
with every valid bit
string x.
Redundant number system: The system in which the numerical value x
e
could be represented by more than
one bit string.
SRT algorithm: An algorithm for division of binary numbers which uses redundant number representation.
Wallace tree: A technique for summing the partial product bits of a parallel multiplier in a carry-save fashion
using full-adder cells.
Related Topic
86.1 Number Systems
References
A. Avizienis, “Digital computer arithmetic: A unified algorithmic specification,” in Symposium on Computers
and Automata, Polytechnic Institute of Brooklyn, April 13–15, 1971.
A. D. Booth, “A signed binary multiplication technique”, Quarterly J. Mechan. Appl. Math., vol. IV, 1951.
L. Dadda, “Some schemes for parallel multipliers,” Alta Frequenza, 34, 349–356, 1965.
M. Ercegovac, Digital Systems and Hardware/Firmware Algorithms, New York: Wiley, 1985, chap. 12.
O. L. MacSorley, “High speed arithmetic in binary computers”, Proc. IRE, 49(1), 1961.
V. G. Oklobdzija and E. R. Barnes, “Some optimal schemes for ALU implementation in VLSI technology”,
Proceedings of 7th Symposium on Computer Arithmetic, Urbana, Ill.: University of Illinois, June 4–6,
1985.
Sklanski, “Conditional-sum addition logic”, IRE Trans. Electron. Computers, EC-9, 226–231, 1960.
C. S. Wallace, “A suggestion for a fast multiplier,” IEEE Trans. Electron. Computers, EC-13, 14–17, 1964.
S. Waser and M. Flynn, Introduction to Arithmetic for Digital Systems Designers, New York: Holt, 1982.
Weinberger and J. L. Smith, “A logic for high-speed addition”, National Bureau of Standards, Circulation 591,
p. 3–12, 1958.
Further Information
For more information about specific arithmetic algorithms and their implementation see K. Hwang, Computer
Arithmetic: Principles, Architecture and Design, New York: Wiley, 1979 and also E. Swartzlander, Computer
Arithmetic, vols. I and II, Los Alamitos, Calif.: IEEE Computer Society Press, 1980.
Publications in IEEE Transactions on Electronic Computers and Proceedings of the Computer Arithmetic
Symposia by various authors are very good sources for detailed information on a particular algorithm or
implementation.
? 2000 by CRC Press LLC
86.3 Architecture
1
V. Carl Hamacher, Zvonko G. Vranesic, and Safwat G. Zaky
Computer architecture can be defined here to mean the functional operation of the individual hardware units
in a computer system and the flow of information and control among them. This is a somewhat more general
definition than is sometimes used. For example, some articles and books refer to instruction set architecture
or the system bus architecture.
The main functional units of a single-processor system, a basic way to interconnect them, and features that
are used to increase the speed with which the computer executes programs will be described. Following this,
a brief introduction to systems that have more than one processor will be provided.
Functional Units
A digital computer, or simply a computer, accepts digitized input information, processes it according to a list
of internally stored machine instructions, and produces the resultant output information. The list of instructions
is called a program, and internal storage is called computer memory.
A computer has five functionally independent main parts: input, memory, arithmetic and logic, output, and
control. The input unit accepts digitally encoded information from human operators, through electromechan-
ical devices such as a keyboard, or from other computers over digital communication lines. The information
received is usually stored in the memory and then operated on by the arithmetic and logic unit circuitry under
the control of a program. Results are sent back to the outside world through the output unit. All these actions
are coordinated by the control unit. The arithmetic and logic unit, in conjunction with the main control unit,
are referred to as the processor.
Input and output equipment is usually combined under the term input-output unit (I/O unit). This is
reasonable because some standard equipment provides both input and output functions. The simplest example
of this is the video terminal consisting of a keyboard for input and a cathode-ray tube for output. The control
circuits of the computer recognize two distinct devices, even though the human operator may associate them
as being part of the same physical unit.
The memory unit stores programs and data. There are two main classes of memory devices called primary
and secondary memory. Primary storage, or main memory, is an electronic storage device, constructed from
integrated circuits that consist of millions of semiconductor storage cells, each capable of storing one bit of
information. These cells are accessed in groups of fixed size called words. The main memory is organized so
that the contents of one word can be stored or retrieved in one basic operation called a memory cycle.
To provide direct access to any word in the main memory in a short and fixed amount of time, a distinct
address number is associated with each word location. A given word is accessed by specifying its address and
issuing a control command that starts the storage or retrieval process. The number of bits in a word is called
the word length of the computer. Word lengths vary from 16 to 64 bits. Small machines such as personal
computers or workstations, may have only a few million words in the main memory, while larger machines
have tens of millions of words. The time required to access a word for reading or writing is less than 100 ns.
Although primary memory is essential, it tends to be expensive and volatile. Thus cheaper, more permanent,
magnetic media secondary storage is used for files of information that contain programs and data. A wide
selection of suitable devices is available, including magnetic disks, drums, diskettes, and tapes.
Execution of most operations within a computer takes place in the arithmetic and logic unit (ALU) of a
processor. Consider a typical example. Suppose that two numbers located in the main memory are to be added,
and the sum is to be stored back into the memory. Using a few instructions, each consisting of a few basic
steps, determined by the control unit, the operands are first fetched from the memory into the processor. They
are then added in the ALU, and the result is stored back in memory. Processors contain a number of high-speed
storage elements called registers, which are used for temporary storage of operands. Each register contains one
1
Adapted from V.C. Hamacher, Z.G. Vranesic, and S.G. Zaky, Computer Organization, 4th ed., New York: McGraw-Hill,
1996. With permission.
? 2000 by CRC Press LLC
THE FIRST DIGITAL COMPUTERS
f all the new technologies to emerge from World War II, none was to have such profound and
pervasive impacts as the digital computer. As early as the 1830s, the Englishman Charles Babbage
conceived of an “Analytical Engine” that would perform mathematical operations using
punched cards, hundreds of gears, and steam power. Babbage’s idea was beyond the capabilities of 19th-
century technology, but his vision represented a goal that many were to pursue in the next century and
a half.
In the mid-1920s, MIT electrical engineer Vannevar Bush devised the “product integraph”, a semi-
automatic machine for solving problems in determining the characteristics of complex electrical systems.
This was followed a few years later by the “differential analyzer”, the first general equation-solving
machine. These machines were mechanical, analog devices, but at the same time that they were being
built and copied, the principles of electrical, digital machines were being laid out.
In 1937, Claude Shannon published in the Transactions of the AIEE the circuit principles for an “electric
adder to the base of two”, and George Stibitz of Bell Labs built such an adding device on his own kitchen
table. In that same year, Howard Aiken, then a student at Harvard, proposed a gigantic calculating
machine that could be used for everything from vacuum tube design to problems in relativistic physics.
O
? 2000 by CRC Press LLC
word of data and its access time is about 10 times faster than main memory access time. Large-scale micro-
electronic fabrication techniques allow whole processors to be implemented on a single semiconductor chip
containing a few million transistors.
Basic Operational Concepts
To perform a given computational task, an appropriate program consisting of a set of machine instructions is
stored in the main memory. Individual instructions are brought from the memory into the processor for
execution. Data used as operands are also stored in the memory. A typical instruction may be
MOVE MEMLOC, Ri
This instruction loads a copy of the operand at memory location MEMLOC into the processor register Ri. The
instruction requires a few basic steps to be performed. First, the instruction must be transferred from the
memory into the processor, where it is decoded. Then the operand at location MEMLOC must be fetched into
the processor. Finally, the operand is placed into register R1. After operands are loaded into the processor
registers in this way, instructions such as ADD Ri, Rj, Rk can be used to add the contents of registers Ri and
Rj, and then place the result into register Rk.
Instruction set design has been intensively studied in recent years to determine the effectiveness of the various
alternatives. See Patterson and Hennessey [1994] for a thorough discussion.
With support from Thomas J. Watson, president of IBM, Aiken was able to build his machine, the
“Automatic Sequence Controlled Calculator”, or “Mark I”. When it was finished in 1944, the Mark I was
quickly pressed into war service, calculating ballistics problems for the Navy.
In 1943, the government contracted with John W. Mauchly and J. Presper Eckert of the University of
Pennsylvania to build the “Electronic Numerical Integrator and Computer”—the first true electronic
digital computer. When the ENIAC was finally dedicated in February 1946, it was both a marvel and a
monster—weighing 30 tons, consuming 150 kW of power, and using 18,000 vacuum tubes. With all of
this, it could perform 5,000 additions or 400 multiplications per second, which was about one thousand
The ENIAC, pictured above, was the first true electronic digital computer. Early pro-
? 2000 by CRC Press LLC
The connection between the main memory and the processor that allows for the transfer of instructions and
operands is called the bus, as shown in Fig. 86.7. A bus consists of a set of address, data, and control lines. The
bus also allows program and data files to be transferred from their long-term location on magnetic disk storage
to the main memory. Long distance digital communication with other computers is also enabled by transfers
over the bus to the Communication Line Interface, as shown in the figure. The bus interconnects a number of
devices, but only two devices (a sender and a receiver) can use it at any one time. Therefore, some control
circuitry is needed to manage the orderly use of the bus when a number of devices wish to use it.
Normal execution of programs may sometimes be preempted if some I/O device requires urgent control
action or servicing. For example, a monitoring device in a computer-controlled industrial process may detect
a dangerous condition that requires the execution of a special service program dedicated to the device. To cause
this service program to be executed, the device sends an interrupt signal to the processor. The processor
temporarily suspends the program that is being executed and executes the special interrupt service routine. After
providing the required service, the processor switches back to the interrupted program. To appreciate the
complexity of the computer system software programs needed to control such switching from one program
task to another and to manage the general movement of programs and data between primary and secondary
storage, consult Tanenbaum [1990].
The need often arises during program loading and execution to transfer blocks of data between the main
memory and a disk or other secondary storage I/O devices. Special control circuits are provided to manage
these transfers without detailed control actions from the main processor. Such transfers are referred to as direct
memory access (DMA). Assuming that accesses to the main memory from both I/O devices (such as disks) and
grammers set up problems by plugging in cables and setting switches. ENIAC could
perform calculations about one thousand times faster than any other machine of its
day. (Photo courtesy of the IEEE Center for the History of Electrical Engineering.)
times faster than any other machine of the day. The ENIAC showed the immense possibilities of digital
electronic computers.
These possibilities occupied engineers and mathematicians for the coming decades. For electrical
engineers, the computer represented a challenge and responsibility for the most powerful new machine
of the twentieth century. (Courtesy of the IEEE Center for the History of Electrical Engineering.)
the main processor can be appropriately interwoven over the bus, I/O-memory transfers and computation in
the main processor can proceed in parallel, and performance of the overall system is improved.
Performance
A major performance measure for computer systems is the time, T, that it takes to execute a complete program
for some task. Suppose N machine instructions need to be executed to perform the task. A program is typically
written in some high-level language, translated by a compiler program into machine language, and stored on
a disk. An operating system software routine then loads the machine language program into the main memory,
ready for execution. Assume that, on average, each machine language instruction requires S basic steps for its
execution. If basic steps are executed at the rate of R steps per second, then the time to execute the program is
T = (N ′ S)/R
The main goal in computer architecture is to develop features that minimize T.
We will now give an outline of main memory and processor design features that help to achieve this goal.
The first concept is that of a memory hierarchy. We have already noted that access to operands in processor
registers is significantly faster than access to the main memory. Suppose that when instructions and data are
first loaded into the processor, they are stored in a small, fast, cache memory on the processor chip itself. If
instructions and data in the cache are accessed repeatedly within a short period of time, as happens often with
program loops, then program execution will be speeded up. The cache can only hold small parts of the executing
program. When the cache is full, its contents are replaced by new instructions and data as they are fetched
from the main memory. A variety of cache replacement algorithms are in use. The objective of these algorithms
is to maximize the probability that the instructions and data needed for program execution are found in the
cache. This probability is known as the cache hit ratio. A higher hit ratio means that a larger percentage of the
instructions and data are being found in the cache, and do not require access to the slower main memory. This
leads to a reduction in the memory access basic step time components of S, and hence to a smaller value of T.
The basic idea of a cache can be applied at different points in a computer system, resulting in a hierarchy
of storage units. A typical memory hierarchy is shown in Fig. 86.8. Some systems have two levels of cache to
take the best advantage of size/speed/cost tradeoffs. The main memory is usually not large enough to contain
all of the programs and their data. Therefore, the highest level in the memory hierarchy is usually magnetic
disk storage. As the figure indicates, it has the largest capacity, but the slowest access time. Segments of a
program, often called pages, are transferred from the disk to the main memory for execution. As other pages
are needed, they may replace the pages already in the main memory if the main memory is full. The orderly,
FIGURE 86.7Interconnection of major components in a computer system.
? 2000 by CRC Press LLC
automatic movement of large program and data segments between the main memory and the disk, as programs
execute, is managed by a combination of operating system software and control hardware. This is referred to
as memory management.
We have implicitly assumed that instructions are executed one after another. Most modern processors are
designed to allow the execution of successive instructions to overlap, using a technique known as pipelining.
In the example in Fig. 86.9, each instruction is broken down into 4 basic steps—fetch, decode, operate, and
write—and a separate hardware unit is provided to perform each of these steps. As a result, the execution of
FIGURE 86.8Memory hierarchy.
FIGURE 86.9Pipelining of instruction execution.
? 2000 by CRC Press LLC
successive instructions can be overlapped as shown, resulting in an instruction completion rate of one per basic
time step. If the execution overlap pattern shown in the figure can be maintained for long periods of time, the
effective value of S tends toward 1.
When the execution of some instruction I depends on the results of a previous instruction, J, which is not
yet completed, instruction I must be delayed. The pipeline is said to be stalled, waiting for the execution of
instruction J to be completed. While it is not possible to eliminate such situations altogether, it is important
to minimize the probability of their occurrence. This is a key consideration in the design of the instruction set
of modern processors and the design of the compilers that translate high-level language programs into machine
language.
Now, imagine that multiple functional units are provided in the processor so that more than one instruction
can be in the operate stage. This parallel execution capability, when added to pipelining of the individual
instructions, means that execution rates of more than one instruction completion per basic step time can be
achieved. This mode of enhanced processor performance is called superscalar processing.
The rate, R, of performing basic steps in the processor is usually referred to as the processor clock rate; and
it is of the order of 100 to 200 million steps per second in current high-performance VLSI processors. This rate
is determined by the technology used in fabricating the processors, and is strongly related to the size or area
occupied by individual transistors. This size feature, which is currently in the submicron range, has been steadily
decreasing as fabrication techniques improve, allowing increases in R to be achieved.
Multiprocessors
Physical limits on electronic speeds prevent single processors from being speeded up indefinitely. A major
design trend has seen the development of systems that consist of a large number of processors. Such multipro-
cessors can be used to speed up the execution of large programs by executing subtasks in parallel. The main
difficulty in achieving this type of speedup is in being able to decompose a given task into its parallel subtasks
and assign these subtasks to the individual processors in such a way that communication among the subtasks
can be done efficiently. Fig. 86.10 shows a block diagram of a multiprocessor system, with the interconnection
network needed for data sharing among the processors Pi. Parallel paths are needed in this network in order
for parallel activity to proceed in the processors as they access the global memory space represented by the
multiple memory units Mi.
Defining Terms
Arithmetic and logic unit:The logic gates and register storage elements used to perform the basic operations
of addition, subtraction, multiplication, and division of numeric operands, and the comparison, shifting,
and alignment operations on more general forms of numeric and nonnumeric data.
Bus:The collection of data, address, and control lines that enables exchange of information, usually in word-
size quantities, among the various computer system units. In practice, a large number of units can be
connected to a single bus. These units contend in an orderly way for the use of the bus for individual
transfers.
Cache memory:A high-speed memory for temporary storage of copies of the sections of program and data
from the main memory that are currently active during program execution.
FIGURE 86.10A multiprocessor system.
? 2000 by CRC Press LLC
Computer architecture: The functional operation of the individual hardware units in a computer system and
the flow of information and control among them.
Control unit: The circuits required for sequencing the basic steps needed to execute machine instructions.
Input-output unit (I/O): The equipment and controls necessary for a computer to interact with a human
operator, to access mass storage devices such as disks and tapes, or to communicate with other computer
systems over communication networks.
Memory hierarchy: The collection of cache, primary, and secondary memory units that comprise the total
storage capability in the computer system.
Memory management: The combination of operating system software and hardware controls that is needed
to access and move program and data segments up and down the memory hierarchy during program
execution.
Memory unit: The unit responsible for storing programs and data. There are two main types of units: primary
memory, consisting of millions of bit storage cells fabricated from electronic semiconductor integrated
circuits, used to hold programs and data during program execution; and secondary memory, based on
magnetic disks, diskettes, and tapes, used to store permanent copies of programs and data.
Multiprocessor: a computer system comprising multiple processors and main memory unit modules, con-
nected by a network that allows parallel activity to proceed efficiently among these units in executing
program tasks that have been sectioned into subtasks and assigned to the processors.
Pipelining: The overlapped execution of the multiple steps of successive instructions of a machine language
program, leading to a higher rate of instruction completion than can be attained by executing instructions
strictly one after another.
Processor: The arithmetic and logic unit combined with the control unit needed to sequence the execution
of instructions. Some cache memory is also included in the processor.
Superscalar processing: The ability to execute instructions at a completion rate that is faster than the normal
pipelined rate, by providing multiple functional units in the pipeline to allow a small number of instruc-
tions to proceed through the pipeline process in parallel.
Related Topics
86.2 Computer Arithmetic ? 86.4 Microprogramming
References
V.C. Hamacher, Z.G. Vranesic, and S.G. Zaky, Computer Organization, 4th ed., New York: McGraw-Hill, 1996.
D. A. Patterson and J. L. Hennessey, Computer Organization and Design—The Hardware/Software Interface, San
Mateo, Calif.: Morgan Kaufman, 1994.
A.S. Tanenbaum, Structured Computer Organization, 3rd ed., Englewood Cliffs, N.J.: Prentice-Hall, 1990.
Further Information
The IEEE magazines Computer, Micro, and Software all have interesting articles on subjects related to computer
architecture, including software aspects. Also, articles on computer architecture occasionally appear in Scientific
American.
86.4 Microprogramming
Jacques Raymond
Since the 1950s when Wilkes et. al. [1958] defined the term and the concept, microprogramming has been used
as a clean and systematic way to define the instruction set of a computer. It has also been used to define a
virtual architecture out of a real hardwired one.
? 2000 by CRC Press LLC
Levels of Programming
In Fig. 86.12, we see that a computer application is usually realized by programming a given algorithm in a
high-level language. A system offering a high-level language capability is implemented at the system level via
a compiler. The operating system is (usually) implemented in a lower-level language. The machine instruction
set can be hardwired (in a hardware implementation) or implemented via microprogramming (Fig. 86.11).
Therefore, microprogramming is simply an extra level in the general structure. Since it is used to define the
machine instruction set, it can be considered at the hardware level. Since this definition is done via a program
at a low level, but still eventually modifiable, it can also be considered to be at the software level. For these
reasons, the term firmware has been coined to name sets of microprograms. In short, microinstructions that
specify hardware functions (microoperations such as Open a path, Select operation) are used to form a more
complex instruction (Convert to binary, Add decimal). The machine instruction set is defined via a set of
microprogram routines and a microprogrammed instruction decoder.
In a microprogrammed machine, the hardware is designed in terms of its capabilities (ALU, data paths, I/O,
processing units) with little concern for how these capabilities will have to be accessed by the programmers.
The microoperations are decoded from microinstructions. The way programmers view the machine is defined
at the microprogramming level.
This approach offers some differences over the hardwired approach. The advantages are that it is more
systematic in implementation, modifiable, economical on most designs, and easier to debug. The disadvantages
are that it is uneconomical on simple machines, slower, and needs support software. Like all programs,
microprograms reside in memory. The term “control memory” is commonly used for microprograms.
Microinstruction Structure
On a given hardware, many processing functions are available. In general a subset O of these functions can be
performed in parallel, for example, carrying on an addition between two registers while copying a register on
an I/O bus. These functions are called microcommands.
Horizontal Microinstructions
Each of the fields f of a microinstruction specifies a microcommand. If the format of the microinstruction is
such that all possible microcommands can be specified, the instruction is called horizontal. Most of the time,
it is wasteful in memory as, in a microprogram, not every possible microcommand is specified in each
microinstruction. However, it permits the microprogrammer to fully take advantage of all possible parallelisms
and to build faster machines.
For example, the horizontal specification of an ALU operation,
FIGURE 86.11Levels of programming in a computer system.
ALUOperation SourcePathA SourcePathB ResultPath CvtDecimal
? 2000 by CRC Press LLC
specifies both operands, which register will contain the result, whether or not the result is to be converted to
decimal, and the operation to be performed. If this instruction is, for example, part of a microprogram defining
a 32-bit addition instruction, assuming a 16-bit path, it is wasteful to specify twice the source and result operands.
In some cases, it is possible to design a microinstruction that specifies more microcommands than can be
executed in parallel. In that case, the execution of the microcommand is carried out in more than one clock
cycle. For this reason they are called polyphase microinstructions (as opposed to monophase).
Vertical Microinstructions
At the other extreme, if the microinstruction allows only the specification of a single microcommand at a time,
the instruction is then called vertical. In that case, only the necessary commands for a particular program are
specified, resulting in smaller control memory requirements. However, it is not possible to take advantage of
possible parallelism offered by the hardware, since only one microcommand is executed at a time. For example,
the vertical specification of an ALU operation is as follows:
Diagonal Microinstructions
Most cases fit in between these two extremes (see Fig. 86.13). Some parallelism is possible; however, micro-
commands pertaining to a given processing unit are regrouped. This results in shorter microprograms than in
the vertical case and may still allow some optimization. For example, a diagonal specification of an ALU
operation is as follows:
Optimization
Time and space optimization studies can be performed before designing the microinstruction format. The
reader is referred to Das et al. [1973] and Agerwala [1976] for details and more references.
FIGURE 86.12A view of computer system levels.
SourceA Reg# 1st Operand
SourceB Reg# 2nd Operand
Result Reg# Result
ALU Op Operation
SelectSources RegA# RegB# Select Operands
SelectResult Reg# Dec/Bin Result place and format
Select ALU Operation Perform the operation
? 2000 by CRC Press LLC
Microprogram Development
Microassemblers
The first level of specification of microinstructions is, just like its counterpart at the machine level, the assembler.
Although the process and philosophy is exactly the same, it is traditionally called a microassembler. A microas-
sembler is a software program (it is not relevant to know in which language it is written) whose function is to
translate a source program into the binary code equivalent. Obviously, to write a source program, a language
has to be designed. At assembly level, languages are usually very close to the hardware structure, and the objects
defined are microregisters, gate level controls, and paths. Operations are the microoperations (sometimes
slightly more sophisticated with a microassembler with macrofacilities).
This level provides an easily readable microprogram and does much to help avoid syntax errors. In binary,
only the programmer can catch a faulty 1 or 0; the microassembler can catch syntax errors or some faulty
register specifications. No microassembler exists that can catch all logic errors in the implementation of a given
instruction algorithm. It is still very easy to make mistakes. It should be noted that this level is a good
compromise between convenience and cost.
The following is a typical example of a microprogram in the microassembler (it implements a 16-bit add
on an 8-bit path and ALU):
CLC Clear Carry
Lod A Get first part of first operand
Add B Add to first part of second operand
Sto C Give low byte of final result
Lod a Get second part of first operand
Adc b Add to second part of second operand and to carry bit
Sto c Give high byte of final result
JCS Error Jump to error routine if Result >65536
Jmp FetchNext
FIGURE 86.13 Microinstruction fields versus microcommands.
? 2000 by CRC Press LLC
High-Level Languages for Microprogramming
Many higher-level languages have been designed and implemented; see a discussion of some design philosophies
in Malik and Lewis [1978]. In principle, a high-level language program is oriented toward the application it
supports and is farther away from the hardware-detailed implementation of the machine it runs on. The appli-
cations supported are mostly other machine definitions (emulators) and implementations of some algorithms.
The objects defined and manipulated by high-level languages for microprogramming are therefore the virtual
components of virtual machines. They are usually much the same as their real counterparts: registers, paths,
ALUs, microoperations, etc. Furthermore, writing a microprogram is usually defining a machine architecture.
It involves a lot of intricate and complicated details, but the algorithms implemented are mostly quite simple.
The advantages offered by high-level languages to write better algorithms without getting lost in implementation
details are therefore not exploited.
Firmware Implementation
Microprogramming usually requires the regular phases of design, coding, test, conformance acceptance, doc-
umentation, etc. It differs from other programming activities when it comes to the deliverable. The usual final
product takes the form of hardware, namely, a control memory, PROM, ROM, or other media, containing the
bit patterns equivalent to the firmware. These implementation steps require special hardware and software
support. They include a linker, loader, PAL programmer, or ROM burner; a test program in a control memory
test bench is also necessary.
Supporting Software
It is advisable to test the microprogram before its actual hard implantation, if the implantation process is
irreversible or too costly to repeat. Software simulators have been implemented to allow thorough testing of
the newly developed microprogram. Needless to say these tools are very specialized to a given environment
and therefore costly to develop, as their development cost cannot be distributed over many applications.
Emulation
Concept
In a microprogrammed environment, a computer architecture is softly (or firmly) defined by the microprogram
designed to implement its operations, its data paths, and its machine-level instructions. It is easy to see that if
one changes the microprogram for another one, then a new computer is defined. In this environment, the
desired operation is simulated by the execution of the firmware, instead of being the result of action on real
hardwired components.
Since the word simulation was already in use for simulation of some system by software, the word emulation
was chosen to mean simulation of an instruction set by firmware. Of course, “simulation” by hardware is not
a simulation but the real thing.
The general structure of an emulator consists of the following pseudocode algorithm:
BEGIN
Initialize Machine Components
Repeat
Fetch Instruction
Emulate Operation of the current instruction
Process interrupts
Update instruction counter
Until MachineIsOff
Perform shutdown procedure
END
Many variations exist, in particular to process interrupts within the emulation of a lengthy operation or to
optimize throughput, but the general principle and structure are fairly constant.
? 2000 by CRC Press LLC
Emulation of CPU Operation
One of the advantages of microprogramming is that the designer can implement his or her dream instructions
simply by emulating its operation. We have seen already the code for a typical 16-bit adder, but it is not difficult
to code a parity code generator, a cyclic redundancy check calculator, or an instruction that returns the
eigenvalues of an n ′ n matrix. This part is straight programming. One consideration is to make sure that the
machine is still listening to the outside world (interruptions) or actively monitoring it (I/O flags) in order not
to loose asynchronous data while looking for a particular pattern in a 1 megabyte string. Another consideration
is to optimize memory usage by combining common processes for different operations. For example, emulating
a 32-bit add instruction and emulating a 16-bit add instruction have common parts. This is, however, a
programming concern not specific to emulation.
I/O System and Interrupts
Programming support for I/Os and interrupts is more complicated than for straight machine instructions. This
is due to the considerable speed differences between I/O devices and a CPU, the need for synchronization, the
need for not losing any external event, and the concerns for optimizing processing time. Microprogramming
offers considerable design flexibility, as these problems are more easily handled by programming than with
hardware components.
Applications of Microprogramming
The main application of microprogramming is the emulation of a virtual machine architecture on a different
host hardware machine. It is, however, easy to see that the concept of emulation can be broadened to other
functions than the traditional hardware operation.
It is mainly a matter of point of view. Emulation and simulation are essentially the same process but viewed
from different levels. Realizing a 64-bit addition and implementing a communication controller are qualitatively
the same type of task. Once this is considered, there are theoretically no limits to the uses of microprogramming.
From the programmer’s point of view, programming is the activity of producing, in some language, an
implementation of some algorithm. If the language is at the very lowest level, as is the case with micropro-
gramming, and at the same time the algorithm is filled with intricate data structures and complex decisions,
the task might be enormous, but nothing says it cannot be done (except, maybe, experience). With this
perspective of the field, we now look at some existing applications of microprogramming.
Operating System Support
One of the first applications, besides emulation, was to support some operating system functions. Since
microprograms are closer to the hardware and programming directly in microcode removes the overhead of
decoding machine-level instructions, it was thought that directly coding operating system (OS) functions would
improve their performance. Success was achieved in some areas, such as virtual memory. In general, people
write most OS functions in assembly language, probably because the cost is not offset by the benefits, especially
with rapidly changing OS versions. The problems raised by the human side of programming have changed the
question “Should it be in microcode or in assembler?” to the question “Should it be in assembler or in C?”
This parallels the CISC/RISC debate.
High-Level Languages Support
Early research was done also in the area of support for high-level languages. Support can be in the form of
microprogrammed implementations of some language primitive (for example, the trigonometric functions) or
support for the definition and processing of data structures (for example, trees and lists primitives). Many
interesting research projects have led to esoteric laboratory machines. More common examples include the
translate instructions, string searches and compares, or indexing multidimensional arrays.
Paging, Virtual Memory
An early and typical application of microprogramming is the implementation of the paging algorithm for a
virtual memory system. It is a typical application since it is a low-level function that must be time optimized
and is highly hardware dependent. Furthermore, the various maintenance functions which are required by the
? 2000 by CRC Press LLC
paging algorithms and the disk I/Os can be done during the idle time of the processing of other functions or
during part of that processing in order to avoid I/O delays.
Diagnostics
Diagnostic functions have also been an early application of microprogramming. A firmware implementation
is ideally suited to test the various components of a computer system, since the gates, paths, and units can be
exercised in an isolated manner, therefore allowing one to precisely pinpoint the trouble area.
Controllers
Real-time controllers benefit from a microprogrammed implementation, due to the speed gained by program-
ming only the required functions, therefore avoiding the overhead of general-purpose instructions. Since the
microprogrammer can better make use of the available parallelism in the machine, long processes can still
support the asynchronous arrival of data by incorporating the interrupt polling at intervals in these processes.
High-Level Machines
Machines that directly implement the constructs of high-level languages can be easily implemented via micro-
programming. For example, Prolog machines and Lisp machines have been tried. It is also possible to conceive
an application directly microcoded. Although this could provide a high performance hardware, human errors
and software engineering practice seem to make such a machine more of a curiosity than a maintainable system.
Defining Terms
Control memory: A memory containing a set of microinstructions (a microprogram) that defines the instruc-
tion set and operations of a CPU.
Emulator: The firmware that simulates a given machine architecture.
Firmware: Meant as an intermediate between software, which can be modified very easily, and hardware,
which is practically unchangeable (once built); the word firmware was coined to represent the micropro-
gram in control memory, i.e., the modifiable representation of the CPU instruction set.
High-level language for microprogramming: A high-level language more or less oriented toward the descrip-
tion of a machine. Emulators can more easily be written in a high-level language; the source code is
compiled into the microinstructions for actual implementation.
Horizontal microinstruction: Theoretically, a completely horizontal microinstruction is made up of all the
possible microcommands available in a given CPU. In practice, some encoding is provided to reduce the
length of the instruction.
Microcommand: A small bit field indicating if a gate is open or closed, if a function is enabled or not, if a
control path is active or not, etc. A microcommand is therefore the specification of some action within
the control structure of a CPU.
Microinstruction: The set of microcommands to be executed or not, enabled or not. Each field of a micro-
instruction is a microcommand. The instruction specifies the new state of the CPU.
Vertical microinstruction: A completely vertical microinstruction would contain one field and therefore
would specify one microcommand. An Op code is used to specify which microcommand is specified. In
practice, microinstructions that typically contain three or four fields are called vertical.
Related Topic
86.3 Architecture
References
T. Agerwala, “Microprogram optimization: a survey,” IEEE Trans. Comput., vol. C25, no. 10, pp. 862–873, 1976.
J.D. Bagley, “Microprogrammable virtual machines,” Computer, pp. 38–42, 1976.
D.K. Banerji and J. Raymond, Elements of Microprogramming, Englewood Cliffs, N.J.: Prentice-Hall, 1982.
G.F. Casaglia, “Nanoprogramming vs. microprogramming,” Computer, pp. 54–58, 1976.
? 2000 by CRC Press LLC
S.R. Das, D. K. Banerji, and A. Chattopadhyay, “On control memory minimization in microprogrammed digital
computers,” IEEE Trans. Comput., vol. C22, no. 9, pp. 845–848, 1973.
L.H. Jones, “An annotated bibliography on microprogramming,” SIGMICRO Newsletter, vol. 6, no. 2, pp. 8–31,
1975.
L.H. Jones, “Instruction sequencing in microprogrammed computers,” AFIPS Conf. Proc., vol. 44, pp. 91–98, 1975.
K. Malik and T.J. Lewis, “Design objectives for high level microprogramming languages,” in Proceedings of the
11th Annual Microprogramming Workshop, Englewood Cliffs, N.J.: Prentice-Hall, 1978, pp. 154–160.
J. Raymond and D.K. Banerji, “Using a microprocessor in an intelligent graphics terminal,” Computer, pp. 18–25,
1976.
M.V. Wilkes, W. Renwick, and D.J. Wheeler, “The design of the control unit of an electronic digital computer,”
Proc. IEE, pp. 121–128, 1958.
Further Information
J. Carter, Microprocessor Architecture and Microprogramming, a State Machine Approach, Englewood Cliffs, NJ:
Prentice-Hall, 1996.
S. Habib, Ed., Microprogramming and Firmware Engineering Methods, New York: Van Nostrand Reinhold, 1988.
H. Okuno, N. Osato, and I. Takeuchi, “Firmware approach to fast lisp interpreter. Twentieth annual workshop
on microprogramming”, (MICRO-20), ACM, 1987.
A. J. Van der Hoeven, P. Van Prooijen, E. F. Deprettere, and P. M. Dewilde, “A hardware design system based
on object-oriented principles”, IEEE, pp. 459–463, 1993.
? 2000 by CRC Press LLC