Feldman, J.M., Czeck, E.W., Lewis, T.G., Martin, J.J. “Programming” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000 87 Programming 87.1 Assembly Language NumberCount( )?Comparisons Down on the Factory Floor?Compiler Optimization and Assembly Language 87.2 High-Level Languages What Is a HLL??How High Is a HLL??HLLs and Paradigms 87.3 Data Types and Data Structures Abstract Data Types?Fundamental Data Types?Type Constructors?Dynamic Types?More Dynamic Data Types?Object-Oriented Programming 87.1 Assembly Language James M. Feldman and Edward W. Czeck The true language of computers is a stream of 1s and 0s—bits. Everything in the computer, be it numbers or text or program, spreadsheet or database or 3-D rendering, is nothing but an array of bits. The meaning of the bits is in the “eye of the beholder”; it is determined entirely by context. Bits are not a useful medium for human consumption. Instead, we insist that what we read be formatted spatially and presented in a modest range of visually distinguishable characters. 0 and 1 arranged in a dense, page-filling array do not fulfill these require- ments in any way. The several languages that are presented in this handbook are all intended to make something readable to two quite different readers. On the one hand, they serve the human reader with his/her requirements on symbols and layout; on the other, they provide a grammatically regular language for interpretation by a compiler. A compiler, of course, is normally a program running on a computer, but human beings can and sometimes do play both sides of this game. They want to play with both the input and output. Such accessibility requires that not only the input but the output of the compilation process be comfortably readable by humans. The language of the input is called a high-level language (HLL). Examples are C, Pascal, Ada and Modula II. They are designed to express both regularly and concisely the kinds of operations and the kinds of constructs that programmers manipulate. The output end of the compiler generates object code—a generally unreadable, binary representation of machine language, lacking only the services of a linker to turn it into true machine language. The language that has been constructed to represent object code for human consumption is assembly language. That is the subject of this section. Some might object to our statement of purpose for assembly language. While few will contest the concept of assembly language as the readable form of object code, some see writing assembly code as the way to “get their hands on the inner workings of the machine.” They see it as a “control” issue. Since most HLLs today give the user reasonably direct ways to access hardware, where does the “control” issue arise? What assembly proponents see as the essential reason for having an assembly language is the option to optimize the “important” sections of a program by doing a better job of machine code generation than the compiler does. This perspective was valid enough when compilers were mediocre optimizers. It was not unlike the old days when a car came with a complete set of tools because you needed them. The same thing that has happened to cars has happened to compilers. They are engineered to be “fuel efficient” and perform their assigned functions with remarkable James M. Feldman Northeastern University Edward W. Czeck Northeastern University Ted G. Lewis Navel Postgraduate School Johannes J. Martin University of New Orleans ? 2000 by CRC Press LLC ability. When the cars or compilers get good enough and complex enough, the tinkerer may do more harm than good. IBM’s superscalar RISC computer—the RS6000—comes with superb compilers and no assembler at all. The Pentagon took a long look at their costs of programming their immense array of computers. Contrary to popular legend, they decided to save money. The first amendment not withstanding, their conclusion was: “Thou shalt not assemble.” The four principal reasons for not writing assembly language are ?Any sizable programming job gets done at least four times faster in a HLL. ?Most modern compilers are good optimizers of code; some are superb. ?Almost all important code goes through revisions—maintenance. Reworking old assembly code is similar to breaking good encryption; it takes forever. ?Most important of all is portability. To move any program to a different computer, you must generate machine code for that new platform. With a program in a HLL, a new platform is almost free; all it requires is another pass through the compiler for the target platform. With assembly code, you are back to square one. Assembly code is unique to the platform. Given all of that, the question naturally arises: Why have an article on assembly language? We respond with two reasons, both of which we employ in our work as teachers and programmers: ?An essential ingredient in understanding computer hardware and in designing new computer systems and compilers is a detailed appreciation of the operations of central processing units (CPUs). These are best expressed in assembly language. Our undergraduate Computer Engineering courses include a healthy dose of assembly language programming for this specific reason. ?If you are concerned about either CPU design or compiler effectiveness, you have to be able to look in great detail at the interface between them—machine language. As we have said, the easiest way to read machine language is by translating it to assembly language. This is one way to get assembly language, not by writing in it as a source of code but by running the object code itself through a backward translator called a disassembler. While many compilers will oblige you by providing an assembly listing if asked, often that listing does not include optimizations that occur only when the several modules are linked together, providing opportunities for truly global optimization. Some compilers “help” the reader by using macros (names for predefined blocks of code) in place of the real machine instructions and register assignments. The absence of the optimizations and the inclusion of unexpected macros can make the assembly listing almost useless for obtaining insight into the program’s fine detail. The compilers that we have used on the DECstations and SPARC machines do macro inclusion. To see what is really going on in these machines, you must disassemble the machine code. That is precisely what the Think C? compiler on the Macintosh does when you ask for machine code. It disassembles what it just did in compiling and linking the whole program. What you see is what is really there. The code we present for the 68000 was obtained in that way. These are important applications. Even if most or all other programming needs can be better met in HLLs, these applications are sufficient reason for many engineers to want to know something about assembly language. There are other applications of assembly language, but they tend to be specific to rather specialized and infrequent tasks. For example, the back end of most HLL compilers is a machine code generator. To write one of those, you certainly must know something about assembly language. On rare occasions, you may find some necessary machine-specific transaction which is not supported by the HLL of choice or which requires some special micro optimization. A “patch” of assembly code is a way to fit this inexpressible thought into the program’s vocabulary. These are rare events. The reason why we recommend to you this section on assembly code is that it improves your understanding of HLLs and of computer architecture. We will take a single subroutine which we express in C and look at the machine code that is generated on two representative machines. The machines include two widely used complex instruction set computers (CISCs) and one reduced instruction set computer (RISC). These are the 68000?, the VAX?, and a SPARC?. We will have two objectives: ? 2000 by CRC Press LLC ? To see how a variety of paradigms in HLLs are translated (or, in other words, to see what is really going on when you ask for a particular HLL operation) ? To compare the several architectures to see how they are the same and how they differ The routine attempts to get a count of the number of numbers which occur in a block of text. Since we are seeking numbers and not digits, the task is more complex than you might first assume. This is why we say “attempts.” The function that we present below handles all of the normal text forms: ? Integers, such as 123 or –17 ? Numbers written in a fixed-point format, such as 12.3 or 0.1738 ? Numbers written in a floating-point format, such as –12.7e+19 or 6.781E2 If our program were to scan the indented block of code above, it would report finding six numbers. The symbols that the program recognizes as potentially part of a number include the digits 0 to 9 and the symbols ‘e’, ‘E’, ‘ . ’, ‘–’ and ‘+’. Now it is certainly possible to include other symbols in legitimate numbers, such as HEX numbers or the like, but this little routine will not properly deal with them. Our purpose was not to handle all comers but to provide a routine with some variety of expression and possible application. Let us begin. NumberCount( ) We enter the program at the top with one pointer passed from the calling routine and a set of local variables comprising two integers and eight Boolean variables. Most of the Boolean variables will be used in pairs. The first element of a pair, for instance, ees of ees and latche, indicates that the current character is one of a particular class of non-numeric characters which might be found inside a number. If you consider that the number begins at the first digit, then these characters can occur legally only once within a given number. ees will be set true if the current character is the first instance of either ‘e’ or ‘E’. The paired variable, latche, is set true if there has ever been one of those characters in the current number. The other pairs are period and latchp and sign and latchs. There is also a pair of Booleans which indicate if the current character is a digit and if the scanner is currently inside a number. Were you to limit your numbers to integers, these two are the only Booleans which would be needed. At the top of the program, all Booleans are reset (made FALSE). Then we step through the block looking for numbers. The search stops when we encounter the first null [char(0)] marking the end of the block. Try running through the routine with text containing the three forms of number. You will quickly convince yourself that the routine works with all normal numbers. If someone writes “3..14” or “3.14ee6”, the program will count 2 numbers. That is probably right in the first two cases. Who knows in the third? Let us look at this short routine in C. # define blk_length 20001 int NumberCount(char block[]) { int count=0,inside=0,digit; int ees=0, latche=0, latchp=0, period=0, latchs=0, sign=0; char *source; source = block; do { digit = (*source >= ?0?) && (*source <= ?9?); period = (*source==?.?) && inside && !latchp; && !latche; latchp = (latchp || period); ees = ((*source==?E?) || (*source==?e?)) && inside && !latche; latche = (latche || ees); sign = ((*source==?+?) || (*source==?-?)) && inside && latche && !latchs; latchs = (latchs || sign); if (inside) { if (!(digit || ees || period || sign)) inside=latchp=latche=latchs=0; } else if (digit) { ? 2000 by CRC Press LLC count++; inside = 1; } source++; } while ((*source != ?\0?) && ((source-block)<blk_length+1)); return count; } To access values within the character array, the normal C paradigm is to step a pointer along the array. Source points at the current character in the array; *source is the character (“what source points at”). source is initialized at the top of the program before the loop (source = block;) and incremented (source++;) at the bottom of the loop. Note the many repetitions of *source. Each one means the same current character. If you read that expression as the character which source is pointing to, it looks like an invitation to fetch the same character from memory eight times. A compiler that optimizes by removing common subexpressions should eliminate all but the first such fetch. This optimization is one of the things that we want to look for. For those less familiar with C, the meanings of the less familiar symbols are: == equal (in the logical sense) ! not != not equal && and || or count++ increment count by 1 unit (after using it) C uses 0 as FALSE and anything else as TRUE. Comparisons Down on the Factory Floor Now let us see what we can learn by running this program through compilers on several quite different hosts. The items that we wish to examine include: I. Subroutine operations comprising: A. Building the call block B. The call itself C. Obtaining memory space for local variables D. Accessing the call block E. Returning the function value F. Returning to the calling routine II. Data operations A. Load and store B. Arithmetic C. Logical D. Text III. Program control A. Looping B. if and the issue of multiple tests Our objectives are to build three quite different pictures: ? An appreciation for the operations underlying the HLL statements ? An overview of the architectures of several important examples of CISC and RISC processors ? An appreciation for what a HLL optimizer should be doing for you We will attempt to do all three all of the time. ? 2000 by CRC Press LLC Let us begin with the calling operations. Our first machine will be the MC68000, one of the classical and widely available CISC processors. It or one of its progeny is found in many machines and forms the heart of the Macintosh (not the PowerMac) and the early Sun workstations. Programmatically, the 68000 family shares a great deal with the very popular VAX family of processors. Both of these CISC designs derive in rather linear fashion from DEC’s PDP-11 machines that were so widely used in the 1970s. Comparisons to that style of machine will be done with the SPARC, a RISC processor found in Sun, Solbourne, and other workstations. Memory and Registers All computers will have data stored in memory and some space in the CPU for manipulating data. Memory can be considered to be a long list of bytes (8-bit data blocks) with addresses (locations in the list) spanning some large range of numbers from 0 to typically 4 billion (4 GB). The memory is constructed physically by grouping chips so that they appear to form enor- mously deep columns of bytes, as shown in Fig. 87.1. Since each column can deliver one byte on each request, the number of adjacent columns determines the number of bytes which may be obtained from a single request. Machines today have 1, 2, 4, or 8 such columns. (Some machines, the 68000 being our cur- rent example, have only 2 columns but arrange to have the CPU ask for two successive transfers to get a total of 4 bytes.) In general, the CPU may manipulate in a single step a datum as wide as the memory. For all of the machines which we will consider, that maximum datum size is 32 bits or 4 bytes. While convention would have us call this biggest datum a word, historical reason has led both the VAX and MC68000 to call it a longword. Then, 2 bytes is either a halfword or a word. We will use the VAX/68000 notation (longword, word, and byte) wherever possible to simplify the reading. To load data from memory, the CPU sends the address and the datum size to the memory and gets the datum as the reply. To store data, the address is sent and then the datum and datum size. Some machines require that the datum be properly aligned with the stacking order of the memory columns in Fig. 87.1. Thus, on the SPARC, a longword must have an address ending in 00 (xxx00 in Fig. 87.1), and a word address must end in 0. The programmer who arranges to violate this rule will be greeted with an address error. Since the MC68000 has only two columns, it complains only if you ask for words or longwords with odd addresses. Successor models of that chip (68020, 30, and 40), like the VAX, accept any address and have the CPU read two longwords and do the proper repacking. The instruction explicitly specifies the size and indicates how the CPU should calculate the address. An instruction to load a byte, for example, is LB, MOVE.B, or MOVB on the SPARC, MC68000, and VAX, respectively. These are followed immediately by an expression which specifies an address. We will discuss how to specify an address later. First, we must introduce the concept of a register. The space for holding data and working on it in the CPU is the register set. Registers are a very important resource. Bringing data in from memory is quite separate from any operations on that data. Data in memory must first be fetched, then acted upon. Data in registers can be acted on immediately. Thus, the availability of registers to store very active variables and intermediate results makes a processor inherently faster. In some machines, most or all of the registers are tied to specific uses. The most prevalent example would be Intel’s 80x86 processors, which power the ubiquitous PC. Such architectures, however, are considered quite old- fashioned. All of the machines that we are considering are of a type called general register machines in that they have a large group of registers which may be used for any purpose. The machines that we include have either 16 or 32 registers, with only a few tied to specific machine operations. Table 87.1 shows the general register resources in the three machines. The SPARC is a little strange. The machine provides eight global registers and then a window blind of 128 registers which sits behind a frame FIGURE 87.1 Memory arranged as 4 columns of bytes. The binary addresses are shown in the two formats widely used in computers. The illustration shows only 32 bytes in a 4 3 8 array, but a more realistic span would be 4 3 1,000,000 or 4 3 4,000,000 (4 MB to 16 MB). ? 2000 by CRC Press LLC which exposes 24 of the 128. A program can ask the machine to raise or lower the blind by 16 registers. That leaves an overlap of eight between successive yanks or rewinds. This arrangement is called a multiple overlapping register set (MORS). If you think of starting with register r8 at the bottom and r31 at the top, a yank of 16 on the blind will now have r49 at the top and r24 at the bottom. r24 to r31 are shared between the old set and the new. To avoid having to keep track of which registers are showing, the set of 24 are divided into what came in from the last set, those that are only local, and those that will go out to the next set. These names apply to going toward increasing numbers. In going the other direction, the ins of the current set will become the outs of the next set. Almost all other machines keep their registers screwed down to the local masonry, but you will see in a moment how useful a MORS can be. (Like other useful but expensive accessories, the debate is always on whether it is worth it [Patterson and Hennessy, 1989].) Stack. Most subroutines define a number of local variables. NumberCount in C, for example, defines 10 local variables. While these local variables will often be created and kept in register, there is always some need for a bit of memory for each invocation of (call to) a subroutine. In the “good old days,” this local storage was often tied to the block of code comprising the subroutine. However, such a fixed block means that a subroutine could never call itself or be called by something that it called. To avoid that problem (and for other purposes) a memory structure called a stack was invented which got its name because it behaved like the spring-loaded plate stack in a restaurant. Basically, it is a last-in-first-out (LIFO) structure whose top is defined by a pointer (address) which resides in a register commonly called the stack pointer or SP. Heap. When a subroutine needs space to store local variables, it acquires that space on the stack. When the subroutine finishes, it returns that stack space for use by other routines. Thus, local variable allocations live and die with their subroutines. It is often necessary to create a data structure which is passed to other routines whose lives are independent of the creating routine. This kind of storage must be independent of the creator. To meet this need, the heap was invented. This is an expandable storage area managed by the system. You get an allocation by asking for it [malloc (structure_size) in C]. You get back a pointer to the allocation and the routine can pass that pointer to any other routine and then go away. When it comes time to dispose of the allocation—that is, return the space for other uses—the program must do that actively by a deallocation call [free(pointer) in C]. Thus, one function can create a structure, several may use it, and another one can return the memory for other uses, all by passing the pointer to the structure from one to another. Both heap and stack provide a mechanism to obtain large (or small) amounts of storage dynamically. Thus, large structures which are created only at run time need not have static space stored for them in programs that are stored on disk nor need they occupy great chunks of memory when the program does not need them. Dynamic allocation is very useful and all modern HLLs provide for it. Since there are two types of dynamic storage, there must be some way to lay out memory so that unpredictable needs in either stack or heap can be met at all times. The mechanism is simplicity itself. The program is stuffed into low addresses in memory along with any static storage (e.g., globals) which are declared in the program. The entire remaining space is then devoted to dynamic storage. The heap starts right after the program and TABLE 87.1 General Registers in the Three Machines Reg Special Names Comments MC68000 16 1 D0..D7 A0..A7 A(ddress) register operations are 32 bits wide. Address generation uses A registers as bases. D (data) registers allow byte, word, and longword operations. A7 is SP. VAX 16 4 r0..r11 AP, FP, SP, PC AP,FP,SP and PC hold the addresses of the argument block, the frame, the stack and the current place in the program, respectively. All data instructions can use any register. SPARC 32 (136) 4 zero, g1..g7, i0..i5, FP, RA, l0..l7, o0..o5, SP, o7 The 4 groups of eight registers comprise: global (g), incoming parameters (i), local (l) and outgoing parameters (o). g0 is a hardwired 0 as a data source and a wastebasket as a destination. The registers are arranged as a window blind (see text) with the g’s always visible and the others moveable in multiple overlapping frames of 24. The special registers are within the set of general registers. Where a PC is not listed, it exists as a special register and can be used as an address when the program uses program-relative addressing. ? 2000 by CRC Press LLC grows toward higher addresses; the stack goes in at the top of memory and grows down. The system is responsible to see that they never collide (a stack crash). When it all goes together, it looks like Fig. 87.2 [Aho et al., 1986]. There is one last tidbit that an assembly programmer must be aware of in looking at memory. Just as some human alphabets are written left to right and some right to left (not to mention top to bottom), computer manufacturers have chosen to disagree on how to arrange words in memory. The two schemes are called big-endian and little-endian (after which end of a number goes in the lowest-numbered byte and also after a marvelous episode in Gulliver’s Travels). The easiest way to perceive how it is done in the two systems is to think of all numbers as being written in conventional order (left to right), but for big-endian you start counting on the upper left of the page and on little-endian you start counting on the upper right (see Fig. 87.1). Since each character in a text block is a number of length 1 byte, this easy description makes big-endian text read in normal order (left to right) but little-endian text reads from right to left. Figure 87.3 shows the sentence “This is a sentence” followed by the two hexadecimal (HEX) numbers 01020304 and 0A0B0C0D written to consecutive bytes in the two systems. Why must we bring this up? Because anyone working in assembly language must know how the bytes are arranged. Furthermore, two of the systems we are considering are big-endian and one (the VAX) is little-endian. Which is the better system? Either one. It is having both of them that is a nuisance. As you look at Fig. 87.3, undoubtedly you will prefer big-endian, but that is only because it appeals to your prejudices. In truth, either works well. What is important is that you be able to direct your program to go fetch the item of choice. In both systems, you use the lowest-numbered byte to indicate the item of choice. Thus, for the number 01020304, the address will be 13. For the big-endian system, 13 will point to the byte containing 04 and for the little-endian system, it will point at the byte containing 01. Figure 87.3 contains a problem for some computers which we alluded to in the discussion of Fig. 87.1. We have arranged the bytes to be four in a row as in Fig. 87.1. That is the way that the memory is arranged in two of our three machines. (In the 68000, there are only two columns.) A good way to look at the fetch operation is that the memory always delivers a whole row and then the processor must acquire the parts that it wants and then properly arrange them. (This is the effect if not always the method.) Some processors—the VAX being a conspicuous example—are willing to undertake getting a longword by fetching two longwords and then piecing together the parts that it wants. Others (in our case, the 68000 and the SPARC) are not so accommo- dating. Those machines opt for simplicity and speed and require that the program keep its data aligned. To use one of those machines, you (or the compiler or assembler) must rearrange Fig. 87.3 by inserting a null byte into Fig. 87.2. This modification is shown in Fig. 87.4. With this modification, all three machines could fetch the two numbers in one operation without rearrangement. Look closely at the numbers 01020304 and 0A0B0C0D in Fig. 87.4. Notice that for both configurations, the numbers read from left to right and that (visually) they appear to be in the same place. Furthermore, as pointed out in the discussion of Fig. 87.3, the “beginning” or address of each of the numbers is identical. However, the byte that is pointed at by the address is not the same and the internal bytes do not have the same addresses. Getting big-endian and little-endian machines in a conversation is not easy. It proves to be even more muddled than these figures suggest. A delightful and cogent discussion of the whole issue is found in Cohen [1981]. The principal objective in this whole section has been accomplished if looking at Fig. 87.4 and given the command to load a byte from location 0000 0019, you get the number 0B in the big-endian machine and 0C in the little-endian machine. If you are not already familiar with storing structures in memory, look at the string (sentence) and ask how those letters get in memory. To begin with, every typeable symbol and all of the unprintable actions such as tabbing and carriage returns have been assigned a numerical value from the ASCII code. Each assignment is a byte-long number. What “This” really looks like (HEX, left to right) is 54 68 69 73. The spaces are HEX 20; the period 2E. With the alignment null byte at the end, this list of characters forms a proper C string. It is a structure of 20 bytes. A structure of any number of bytes can be stored, but from the assembly point of view, FIGURE 87.2Layout of a program, static storage, and dynamic storage in memory. ? 2000 by CRC Press LLC it is all just a list of bytes. You may access them two at a time, four at a time, or one at a time. Any interpretation of those bytes is entirely up to the program. Unlike the HLL which requires that you tell it what each named variable is, assembly language knows only bytes and groups of bytes. In assembly language, the “T” can be thought of as a letter or the number 54 (HEX). Your choice. Or, more importantly, your program’s choice. Addressing Now that we have both memory and addresses, we should next consider how these processors require that programmers specify the data that is to be acted upon by the instructions. All of these machines have multiple modes of address. The VAX has the biggest vocabulary; the SPARC the smallest. Yet all can accomplish the same tasks. Four general types of address specification are quite common among assembly languages. These are shown in Table 87.2. They are spelled out in words in the table, but their usage is really developed in the examples which follow in this and the succeeding sections. In Table 87.2, formats 1.4 and 1.5 and the entries in 4 require some expansion. The others will be clear in the examples we will present. Base-index addressing is the mechanism for dealing with subscripts. The base points at the starting point of a data structure, such as a string or a vector; the index measures the offset from the start of the structure to the element in question. For most machines, the index is simply a separate register which counts the bytes from the base to the item in question. If the items in the list are 4 bytes long, then to increment the index, you add 4. While that is not hard to remember, the VAX does its multiplication by the item length for you. Furthermore, it allows you to index any form of address that you can write. To show you what that means, consider expanding numbers stored in words into numbers stored in longwords. The extension is to preserve sign. The VAX provides specific instructions for conversions. If we were moving these words in one array to longwords in another array, we would write: FIGURE 87.3Byte numbering and number placement for big- and little-endian systems. Hexadecimal numbers are used for the memory addresses. FIGURE 87.4The same items as in Fig. 87.3, but with justification of the long integers to begin on a longword boundary. ? 2000 by CRC Press LLC CVTWL (r4)[r5],(r6)[r5] ;convert the words starting at (r4) to longwords starting at (r6) Note that the same index, [r5], is used for both arrays. On the left, the contents of r5 are multiplied by 2 and added to r4 to get the address; on the right, the address is r5*4+r6. You would be saying: “Convert the 4th word to the 4th longword.” This is undoubtedly compact and sometimes convenient. It is also unique to the VAX. For the 68000, the designers folded both base-displacement and base-index into one mode and made room for word or longword indices. It looks like: add.1 64(A3,D2.w),D3 ;address = (A3+64) +sign-extended(D2) The 68000 limits the displacement to a signed byte, but other than that, it is indeed a rather general indexing format. If you do not want the displacement, set it to 0. For the powerful but simple SPARC, the simple base-index form shown in 1.4 is all that you have (or need). The double-indirect format, 1.5, is so rarely used that it has been left out of almost all designs but the VAX. What makes it occasionally useful is that subroutines get pointers to “pass by pointer” variables. Thus, if you want to get the variable, first you must load the address and then the variable. The VAX allows you to do this in one instruction. While that sounds compact, it is expensive in memory cycles. If you want to use that pointer again, it pays to have it in register. The two items under heading 4 are strange at first. Their principal function is adding items to and removing them from a dynamic stack, or for C, to execute the operation *X++ or *(– –X). The action may be viewed with the code below and the illustration of memory in Fig. 87.2: movl r4, –(sp) ;make room on the stack (subtract 4 from SP) and put the contents of r4 in that spot movl (sp)+, r4 ;take a longword off the stack, shorten the stack by 4 bytes, and put the longword in r4 RISCs abhor instructions which do two unrelated things. Instead of using a dynamic stack, they use a quasi- static stack. If a subroutine needs 12 bytes of stack space, it explicitly subtracts 12 from SP. Then it works from there with the base-displacement format (1.3) to reference any place in the block of bytes just defined. If you want to use a pointer and then increment the pointer, RISCs will do that as two independent instructions. Let us consider one short section of MC68000 code from our sample program in C to see how these modes work and to sample some of the flavor of the language: TABLE 87.2 Addressing Modes 1. Explicit addresses Example 1.1. Absolute addressing 765 The actual address written into the instruction. 1.2. Register indirect (r3) Meaning “the address is in register 3.” 1.3. Base-displacement –12(r3) Meaning “12 bytes before the address in register 3.” 1.4. Base-index (r3,r4) Meaning make an address by adding the contents of r3 and r4. This mode has many variations which are discussed below. 1.5. Double indirect @5(r4) Very uncommon! Means calculate an address as in 1.3, then fetch the longword there, and then use it as the address of what you really want. 2. Direct data specification 2.1. Immediate/literal #6 or 6 Meaning “use 6 as the datum.” In machines which use #6, 6 without # means address 6. This is called “absolute addressing.” 3. Program-relative 3.1. Labels loop: The label (typically an alphanumeric ending in a colon) is a marker in the program which the assembler and linker keep track of. The common uses are to jump to a labeled spot or to load labeled constants stored with the program. 4. Address-modifying forms (CISC only) 4.1. Postincrement (sp)+ Same as 1.2 except that, after the address is used, it is incremented by the size of the datum in bytes and returned to the register from which it came. 4.2. Predecrement –(sp) The value in SP is decremented by the size of the datum in bytes, used as the address and returned to the register from which it came. ? 2000 by CRC Press LLC ;ees = ((*source=='E') || (*source=='e')) && inside && !latche; CMPI.B #$45,(A4) ; 'E' “compare immediate” literal hex 45, what A4 points at BEQ first ; “branch if equal” to label first CMPI.B #$65,(A4) ; 'e' “compare immediate” literal hex 65, what A4 points at BNE second ; “branch if not equal” to label second first: TST.W D6 ; “test word” (subtract 0) D6 (‘inside’) BEQ second ; “branch if equal” to label second TST.W D3 ; “test word” (subtract 0) D3 (‘latche’) BEQ third ; “branch if equal” to label third second: MOVEQ #00,D0 ; “move quick” literal 0 to D0 BRA fourth ; “branch always” to label fourth third: MOVEQ #$01,D0 ; “move quick” literal 1 to D0 fourth: MOVE.W D0,-6(A6); “move word” from D0 to -6(FP) There are all sorts of little details in this short example. For example, a common way to indicate a comment is to start with a “;”. The assembler will ignore your comments. The “#” indicates a literal, and the “$” that the literal is written in hexadecimal notation. The VAX would use #^x to express the same idea. “Compare” means “subtract but save only the condition codes of the result” (v or overflow, n or negative, z or zero, and c or carry). Thus, the first two lines do a subtraction of whatever A4 is pointing at (*source) from the ASCII value for ‘E’ and then, if the two were equal (the result, zero), the program jumps to line 5. If *source is not ‘E’, then it simply goes to the next line, line 3. The instruction, TST.W D6, is quite equivalent to CMPI.W D6, #0, but the TST instruction is inherently shorter and faster. On a SPARC, where it would be neither shorter nor faster, TST does not exist. Exactly what the assembler or linker does to replace the label references with proper addresses, while interesting, is not particularly germane to our current topic. Note that the range of the branch is somewhat limited. In the 68000, the maximum branch is ±32K and in the VAX a mere +127 to –128. If you need to go further, you must combine a branch with a jump. For example, if you were doing BEQ farlabel, you would instead do: BNE nearlabel jmp farlabel ; this instruction can go any distance nearlabel: Follow through the example above until the short steps of logic and the addressing modes are clear. Then progress to the next section where we use the addressing modes to introduce the general topic of subroutine calling conventions. Calling Conventions Whenever you invoke a subroutine in a HLL, the calling routine (caller) must pass to the called routine (callee) the parameters that the subroutine requires. These parameters are defined at compile time to be either pass- by-value or pass-by-pointer (or pass-by-reference), and they are listed in some particular order. The convention for passing the parameters varies from architecture to architecture and HLL to HLL, but basically it always consists of building a call block which contains all of the parameters and which will be found where the recipient expects to find it. Along with the passing of parameters, for each system, a convention is defined for register and stack use which establishes: ? 2000 by CRC Press LLC ? Which registers must be returned from callee to caller with the same contents that the callee received (such registers are said to be preserved across a call) ? Which registers may be used without worrying about their contents (such registers are called scratch registers) ? Where the return address is to be found ? Where the value returned by a function will be found The convention may be supported by hardware or simply a gentlemanly rule of the road. However the rules come into being, they define the steps which must be accomplished coming into and out of a subroutine. The whole collection of such rules forms the calling convention for that machine. In this section, we look at our three different machines to see how all accomplish the same tasks but by rather different mechanisms. The two CISCs do almost all of their passing and saving on the stack. The call block will be built on the stack; the return address will be put on the stack; saved registers will be put on the stack. Only a few stack references are passed forward in register; the value returned by the function will be passed back in register. How different is the SPARC! The parameters to be passed are placed in the out registers (six are available for this purpose). Only the overflow, if any, would go on the stack. In general, registers are saved by window- blinding rather than moving them to the stack. On return, data is returned in the in registers and the registers restored by reverse window-blinding. MC68000 Call and Return. Let us look at the details for two of the machines. We start with the 68000, because that is the most open and “conventional.” We continue with the function NumberCount. Only a single parameter must be passed—the pointer to the text block. The HLL callee sees NumberCount(block) as an integer (i.e., what will be returned), but the assembly program must do a call and then use the returned integer as instructed. A typical assembly routine would be: MOVE.L A2,-(SP) ; move pointer to block onto the stack JSR NumberCount ; save return address on the stack and start ; executing NumberCount ; do something with value returned in D0 The first instruction puts the pointer to the block, which is in A2, on the stack. It first must make room, so the “–” in –(A7) first subtracts 4 from A7 (making room for the longword) and then moves the longword into the space pointed to by the now-modified A7. The one instruction does two things: the decrementing of SP and the storing of the longword in memory. MOVE.L A2,D(A7) A7 ü A7D4 ;A7 = SP M(A7) ü A2 ;M(x) = memory(address x) The next instruction, jump subroutine (JSR), does three things. It decrements SP (i.e., A7) by 4, stores the return address on the top of the stack, and puts the address of NumberCount in the program counter. We have just introduced two items which need specific definition: Return address (RA): This will always be the address of the instruction which the callee should return to. In the 68000 and the VAX (and all other CISCs), the RA points to the first instruction after the JSR. In the SPARC and almost any RISC, RA will point to the second instruction after JSR. That curious difference will be discussed later. Program counter (PC): This register (which is a general register on the VAX but a special register on the other machines) points to the place (memory location) in the machine language instruction stream where the program is currently operating. As each instruction is fetched, the PC is automatically incremented. The action of the JSR is to save the last version of the PC—the one for the next fetch—and replace it with the starting address of the routine to be jumped to. Summing up these transactions in algebraic form: ? 2000 by CRC Press LLC JSR NumberCount SP ü SP-4 ;A7 = SP M(SP) ü PC ;M(x) = memory(address x) PC ü address of NumberCount Should you wonder how the address of NumberCount gets in there, the linker, which assigns each section of code to its proper place in memory and therefore knows where all the labels are, will insert the proper address in place of the name. This completes the call as far as building the call block, doing the call itself, and picking up the result. Had there been more parameters to pass, that first instruction would have been replicated enough times to push all of the parameters, one at a time, onto the stack. Now let us look at the conventions from the point of view of the callee. The callee has more work. When the callee picks up the action, the stack and registers are as shown in Figure 87.5. With the exception of D0 and A7, the callee has no registers . . . yet. The callee must make room for local variables in either register or memory. If it wants to use registers, it must save the user’s data from the registers. The subroutine can get whatever space it needs on the stack. Only after the setup will it get down to work. The entire section of stack used for local variables and saving registers is called the callee’s frame. It is useful to have a pointer (FP) to the bottom of the frame to provide a static reference to the return address, the passed parameters, and the subroutine’s local variable area on the stack. In the 68000, the convention is to use A6 as FP. When our routine, NumberCount, begins, the address in A6 points to the start of the caller’s frame. The first thing the callee must do is to establish a local frame. It does that with the instruction LINK. Typical of a CISC, each instruction does a large piece of the action. The whole entry operation for the 68000 is contained in two instructions: LINK A6,#$FFF8 MOVEM.L D3-D7/A4,-(SP) The first instruction does the frame making; the second does the saving of registers. There are multiple steps in each. Each double step of decrementing SP and moving a value onto the stack will be called a push. The steps are as follows: LINK A6,#$FFF8 ;push A6 (A7 ü A7-4, M(A7) ü A6) ;move A7 to A6 (SP to FP) ;add FFF8 (-8) to SP (4 words for local variables) FIGURE 87.5The stack area of the 68000’s memory and the register assignments that the called subroutine sees as it is entered at the top. The registers all hold longwords, the size of an address. In typical PC/Macintosh compilers, integers are defined as 16-bit words. Accordingly, the stack area of memory is shown as words, or half the width of a register. ? 2000 by CRC Press LLC MOVEM.L D3-D7/A4,-(A7) ;push 5 data registers (3..7) and 1 address ;register (A4) At this point, the stack looks like Fig. 87.6. The subroutine is prepared to proceed. How it uses those free registers and the working space set aside on the stack is the subject of the section on optimization in this chapter. For the moment, however, we simply assume that it will do its thing in exemplary fashion, get the count of the numbers, and return. We continue in this section by considering the rather simple transaction of getting back. The callee is obliged to put the answer back where the caller expects to find it. Two paradigms for return are common. The one that our compiler uses is to put the answer in D0. The other common paradigm is to put the answer back on the stack. The user will have left enough room for that answer at FP+8, whether or not that space was also used for transferring parameters in. Using our paradigm, the return becomes: MOVE.W $FFFC(A6),D0 ;answer from callee?s stack frame [-4(FP)] to D0 MOVEM.L (A7)+,D3-D7/A4 ;registers restored to former values UNLK A6 ;SP ü FP, FP ü M(SP), SP ü SP+4 RTS ;PC ü M(SP), SP ü SP+4 When all of this has transpired, the machine is back to the caller with SP pointing at block. The registers look like Fig. 87.5 except for two important changes. SP is back where the caller left it and D0 contains the answer that the caller asked for. Transactional Paradigms The final topic in this section is the description of some of the translations of the simple and ordinary phrases of the HLLs into assembly language. We will show some in each of our three machines to show both the similarities and slightly different flavors that each machine architecture gives to the translation. The paradigms that we will discuss comprise: ?Arithmetic ?Replacement ?Testing and branching, particularly multiple Boolean expressions ?Stepping through a structure FIGURE 87.6The stack area of the 68000’s memory and the register situation just after MOVEM has been executed. The memory area between the two arrows is the subroutine’s frame. ? 2000 by CRC Press LLC Many studies have shown that most computer arithmetic is concerned with addressing, testing, and indexing. In NumberCount there are several examples of each. For example, near the bottom of the program, there are statements such as: count++; For all three machines, the basic translation is the same: Add an immediate (a constant stored right in the instruction) to a number in register. However, for the CISCs, one may also ask that the number be brought in and even put back in memory. The three translations of this pair comprise: MC68000 VAX SPARC ADDQ.W #$1,$FFFE(A6) INCL R0 add %o2,1,%o2 Typical of the VAX, it makes a special case out of adding 1. There is no essential difference in asking it to add 1 by saying “1,” but if one has a special instruction, it saves a byte of program length. With today’s inexpensive memories, a byte is no longer a big deal, but when the VAX first emerged (1978), they were delivered with less memory than a PC or Mac would have today. The VAX, of course, can say ADDL #1, r0 just like the 68000, and for any number other than 1 or 0, it would. Note also that the VAX compiler chose to keep count in register, while in Think C? decided to put it on the stack (–2(SP)). A RISC has no choice. If you want arithmetic, your numbers must be in register. However, once again, we are really talking about the length of the code, not the speed of the transaction. All transactions take place from registers. The only issues are whether the programmer can see the registers and whether a single instruction can include both moving the operands and doing the operand arithmetic. The RISC separates the address arithmetic (e.g., –2(SP)) from the operand arithmetic, putting each in its own instruction. Both get the job done. The next items we listed were replacement and testing and branching. We have both within the statement: digit = (*source >= '0') && (*source <= '9'); The translation requires several statements: MC68000 VAX SPARC MOVE.B (A4),D3 clrb r1 add %g0,0,%o1 CMPI.B #$30,D3 cmpb @4(ap),#48 ldsb [%o2],%o0 BLT ZERO blss ZERO subcc %o0, 47,%g0 CMPI.B #$39,D3 cmpb @4(ap),#57 ble ZERO BLE ONE bgtr ZERO nop incb r1 subcc %o0,57,%g0 bg ZERO nop add %g0, 1,%o1 add %o1,0,%l3 ZERO: ZERO: ZERO: MOVEQ #$00,D0 BRA DONE ONE: MOVEQ #$01,D0 DONE: MOVE.W D0,$FFF6(A6) To begin with, all three do roughly the same thing. The only noticeable difference in concept is that the SPARC compiler chose to compare the incoming character (*source) to 47 (the character before ‘0’) and then branch if the result showed the letter to be “less than or equal,” while the other two compared it to ‘0’ as asked and then branched if the result was “less than.” No big deal. But let us walk down the several columns to see the specific details. Prior to beginning, note that all three must bring in the character, run one or two tests, and then set an integer to either zero (false) or not zero (true). Also, let it be said that each snatch of code is ? 2000 by CRC Press LLC purportedly optimized, but at least with the small sample that we have, it looks as if each could be better. We begin with three parallel walkdowns. Notes as needed are provided below. MC68000 VAX SPARC character from M ? D# Set (byte) DIGIT to 0 Set (byte) DIGIT to 0 Is (D3-'0') <=0? Is (*source-'0') <=0? character from Mfi out1 If <, branch to label ZERO If <, branch to ZERO Is (*source-'/') <=0? Is (D3-'9')<=0? Is (*source-'9') <=0? If <=, branch to ZERO If <=. branch to label ONE If neither, branch to ZERO Is (*source-'9') <=0? Add (byte) 1 to DIGIT If neither, branch to ZERO: ZERO: Add 1 to DIGIT Put a longword 0 in D0 ZERO: Branch to label DONE ONE: Put a longword 1 in D0 DONE: Put value in D0 into DIGIT Notes: 1.Moving the character into register to compare it with ‘0’ and ‘9’: a.The first 68000 line moves the next character as a byte into register D3. The other 3 bytes will be ignored in the byte operations. Remember that the program had already moved the pointer to the string into A4. b.The SPARC does the same sort of thing with a pointer in %o2, except with the difference that it sign- extends the byte to a longword. Sign extension simply copies the sign-bit into the higher-order bits, effectively making 3E into 0000 003E or C2 into FFFF FFC2. That is what the mnemonic means: “LoaD Signed Byte.” c. The VAX compiler takes a totally different approach—a rather poor one, actually. It leaves not only the byte in memory but even the pointer to the byte. Thus, every time it wants to refer to the byte—and it does so numerous times—it must first fetch the address from memory and then fetch the byte itself. This double memory reference is what @4(ap) means: “At the address you will find at address 4(ap).” The only thing that makes all this apparent coming and going even remotely acceptable is that the VAX will cache (place in a fast local memory) both the address and the character the first time that it gets them. Then, it can refer to them rapidly again. Cache references, however, are not as fast as register references. 2. Testing the character: The next line (3rd for the SPARC) does a comparison between 48 (or 47) and the character. Compare is an instruction which subtracts one operand from the other, but instead of putting the results some- where, it stores only the facts on whether the operation delivered a negative number or zero or resulted in either an overflow or a carry. These are stored in flags, single bits associated with the arithmetic unit. The bits can contain only one result at a time. The 68000 and VAX must test immediately after the comparison or they will lose the bits. The SPARC changes the bits only when the instruction says so (the CC on the instruction — “change condition codes”). Thus, the subtraction can be remote from the test-and-branch. The SPARC is explicit about where to store the subtraction—in %g0. %g0 is a pseudo-register. It is always a 0 as a source and is a garbage can as a destination. The availability of the 0 and the garbage can serves all the same functions that the special instructions for zeros and comparisons do on the CISCs. 3. The differences in the algorithm to do the tests: There are two different paradigms expressed in these three examples. One says: “Figure out which thing you want to do and then do that one thing.” The other says: “First set the result false and then figure out if you should set it true.” While the second version would seem to do a lot of unnecessary settings to zero, the other algorithm will execute one less branch. That would make it roughly equivalent. ? 2000 by CRC Press LLC However, the 68000 algorithm is definitely longer—uses more memory for code. That is not really much of an issue, but why put the result first into a temporary register and then where you really want it? Compiler Optimization and Assembly Language Compiler Operations To understand the optimizing features of compilers and their relation to assembly language, it is best to understand some of the chores for the compiler. This section examines variable allocation and how it can be optimized, and the optimization task of constant expression removal. Examples of how compilers perform these operations are taken from the various systems used in the article. Variable Allocation.Variables in high-level languages are an abstraction of memory cells or locations. One of the compiler’s tasks is to assign the abstract variables into physical locations—either registers within the processor or locations within memory. Assignment strategies vary, but an easy and often-used strategy is to place all variables in memory. Easy, indeed, but wasteful of execution time in that it requires memory fetches for all HLL variables. Another assignment strategy is to assign as many variables to the registers as possible and then assign any remaining variables to memory; this method is typically sufficient, except when there is a limited number of registers, such as in the 68000. In these cases, the best assignment strategy is to assign registers to the variables which have the greatest use and then assign any remaining variables to memory. In examining the compilers and architecture used in this article, we find examples of all these methods. In the unoptimized mode, VAX and Sparc compilers are among the many which take the easy approach and assign variables only to memory locations. In Figs. 87.6 and 87.7, the variable assignments are presented for the unoptimized and optimized options. Note that only one or two registers are used, both as scratch pads, in the unoptimized option, whereas the optimization assigns registers to all variables. The expected execution time savings is approximately 42 of the 50 memory references per loop iteration. That does not include additional savings caused by compact code. Detailed comparisons are not presented since the interpretation of architectural comparisons is highly subjective. Unlike the VAX and Sparc compilers, the 68000 compiler assigns variables to registers in both the unoptimized and unoptimized options; these assignments are depicted in Figs. 87.7 and 87.8. Since there are only eight general-purpose data registers in the 68000 and two are assigned as scratch pads, only six of the program’s ten FIGURE 87.7The stack area of the 68000’s memory and the register assignments that the Think C? compiler made with global optimization turned off. The stack is shown just after the MOVEM instruction. The items in bold are as they would be after that instruction. While the registers all hold longwords, in typical PC/Macintosh compilers, integers are defined as words. This figure is the programmatic successor to Fig. 87.6. ? 2000 by CRC Press LLC variables can be assigned to registers. The question is how the 68000 compiler chose which variables to assign to registers and which to leave in memory. As might be expected, the compiler assigned registers based on their usage for the unoptimized option as well as the optimized. The exact counting strategy is unknown. However, a fair estimate, which yields the same assignment as the compiler, is to count only the variable’s usage in the loop—the likely place for the program to spend most of its execution time. There are other ways to estimate the variable usage such as assigning weights to a reference based on its placement within a loop, its placement within a conditional (if-then-else) statement, etc. These estimates and how they are applied within a compiler can change the variable allocation as well as the efficiency of the code. In the optimized case, a slightly different register assignment is used. This is because the optimizer created another character variable—*source—which it assigned to a register. The motivation for its creation and its assignment to a register is shown in the next section on constant expression removal. Even though the assignment of variables to registers gives an improvement in performance, it is not always possible to assign a variable to a register. In C, one operation is to assign the address of a variable to a pointer- type variable (e.g., ip = & i). If i were assigned to a register, the operation would become invalid, because a register does not have a memory address. Although this example appears limited to the C language, the use of a variable’s address is widespread when subroutine parameters are passed by reference. (Variables sent to a subroutine are either passed by reference, where the address of the variable is passed to the subroutine, allowing modifications to the original variable, or they are passed by value, where a copy of the variable is passed to the subroutine.) When a parameter is passed by reference its address must be obtained and passed to the subroutine, an action commonly found in most languages. This action compounds the task of selecting candidate variables for register assignment. Constant Expression Removal. The task of allocating program variables to physical locations is accomplished by all compilers; we have shown that there are many ways to achieve this goal with varying ease or run-time performance. This section explores a compiler task which is done strictly for optimization—the removal of constant expressions. In exploring this task, we show strategies for the recognition of this optimization and also some caveats in their application. Constant expressions are subexpressions within a statement which are unchanged during its execution. An obvious example is the expression vector[x] in the following conditional statement. if (vector[x] < 100) && (vector[x] > 0) then ... FIGURE 87.8 The stack area of the 68000’s memory and the register assignments that the Think C? compiler made with global optimization turned on. The stack is shown just after the MOVEM instruction. The items in bold are as they would be after that instruction. This figure should be compared with Fig. 87.7. ? 2000 by CRC Press LLC An astute coder who does not trust compilers would not allow two memory references for vector[x] in the same conditional statement and would rewrite the code by assigning vector[x] to a temporary variable and using that variable in the conditional. An astute compiler would also recognize the constant expression and assign vector[x] to a scratch pad register and use this register for both comparisons. This type of optimization, where small sections of code (typically one source line) are optimized, is called peep-hole optimization. Within the example program, the assignment statement which checks if the character is a digit within the range from ‘0’ to ‘9’ is a statement which can benefit from this type of optimization. The C code lines, with the unoptimized and optimized SPARC assembly code, are listed below. Note that in addition to the constant expression removal the optimized code also assigns variables to registers. digit = (*source >= ?0?) && (*source <= ?9?) ; In translating this line on the SPARC, Fig. 87.9 shows the 32 registers visible at any moment in the window- blinding SPARC. The top 24 shift by 16 in a normal call. The eight globals remain the same. The shift of the registers is accompanied by copying SP to o6 and the call instruction puts the return address into o7. Accordingly, a call wipes out the caller’s o7. Register g0 serves as a 0 (as a source) and as a wastebasket (as a destination). ld loads a longword, and ldsb sign-extends a byte into a longword. The instruction after a branch is executed whether the branch is taken or not (delayed branching). An instruction such as add 47, %g0, %o2 adds a constant to 0 and puts it in the register. This is equivalent to move.l #47, d4 on the 68000. An add or sub with cc appended changes the condition codes. To do a compare, one uses addcc or subcc and puts the result in g0 (the wastebasket). FIGURE 87.9SPARC register assignments. ? 2000 by CRC Press LLC The same type of constant expression can be found and removed with a global perspective, typically from within loops. A simple example is the best way to describe how they can be removed and to offer some caveats when the compiler cannot see the global picture. The following example code updates each element in the vector by adding the first element scaled by a constant y. An observation shows that the subexpression, vector[0] * y, is constant throughout all executions of the loop. An obvious improvement in code is to calculate the product of vector[0] * y outside of the loop and store its value in a temporary variable. This is done in the second example. Constant expression present Constant expression removed for( i= 0 ; i < size; i++) temp = vector[0] * y ; { vector[i] = vector[i] + (vector[0] * y); } for( i= 0 ; i < size; i++) { vector[i] = vector[i] + temp ; } Ideally, the compiler should find and remove these constant expressions from loops, but this is not as obvious as it may seem. Consider the above example if the following line were inserted in the loop: y = vector[i-1] + vector[i+1] If each source line is taken in isolation, y appears constant, but y is dependent on the loop index i. Hence before removing constant expressions, the compiler must map the dependencies of each variable on the other variables and the loop index. Additionally, other not-so-obvious dependencies—such as when two pointers modify the same structure—are difficult to map and can result in erroneous object code. This is one of the difficulties in optimizing compiler operation and why its extent is limited. A subtle example for constant expression removal is found in our sample program in the reference to *source. In these statements, the character referenced (addressed) by source is obtained from memory. The pointer (address) source is changed only at the bottom of the loop and the memory contents addressed by source are static. A global optimization should obtain the character once at the top of each pass of the loop and save on subsequent memory references throughout. The 68000 C compiler with the optimization option determined *source to be constant throughout the loop and assigned register D3 to hold its contents. This saved seven of the eight memory accesses to *source in each loop pass. The unoptimized 68000 option, the SPARC, and the VAX compilers did not use global constant expression removal and must fetch the operand from memory before its use. The Problems. With optimization yielding more efficient code resulting in improved system performance, why would you not use it? Our favorite, among the several reasons, is the following quote from compiler documentation: “Compiling with optimization may produce incorrect object code.” Problems are caused by assumptions used by the compiler which are not held by the programmer. For example, an optimization which assumes that memory contents do not change with time is erroneous for multi-tasking systems which share memory structures and also for memory- mapped I/O devices, where memory contents are changed by external events. For these cases, the data in register may not match the newer data in memory. Additionally, HLL debuggers do not always work well with the optimization option since the one-to-one correspondence between HLL code and the object code may have been removed by optimization. Consider the reassignment of *source to a data register which is performed by the 68000 C compiler. If a debugger were to modify the contents of *source, then it would have to know about the two locations where it is stored: the memory and the register. Other types of optimizations which may cause problems are when unneeded variables are removed or when code is resequenced. If a HLL debugger tries to single-step through HLL code, there may not be corresponding assembly code, and its execution will appear erroneous. High-Level Language and Assembly Language Relations In comparing the various assembly languages from the compiler-generated code, we have not presented a full vocabulary of assembly languages or the minutiae of the underlying machines. Exploring only the code generated by the compilers may lead one to believe that all assembly languages and processor architectures are pretty much the same. This is not really the case. What we have shown is that compilers typically use the same assembly language instructions regardless of the underlying machines. The compiler writer’s motivation for this apparent similarity is not because all architectures are the same, but because it is difficult—arguably even nonproduc- tive—for the compiler to take advantage of the complex features which some CPU architectures offer. An argument may be made that compilers generate the best code by developing code rather independently of the ? 2000 by CRC Press LLC underlying architecture. Only in the final stages of code generation is the underlying platform’s hardware architecture specifically considered [see Aho et al., 1986]. Differences in the architectures and assembly lan- guages are plentiful; compilers typically do not and probably should not take advantage of such features. The VAX is one of the best examples of an architecture having an almost extraordinary vocabulary, which is why it is often considered the prototypical CISC machine. What were the motivations for having this rich vocabulary if compilers simply ignore it? Early computer programming was accomplished through slightly alphabetized machine language—mnemonics for opcodes and sometimes for labels. Assembly language repre- sented a vast improvement in readability, but even though FORTRAN, COBOL and Algol were extant at the same time as assembly language, their crude or absent optimization abilities led to the popular belief that really good programming was always done in assembly. This was accepted lore until early studies of optimization began to have an impact on commercial compilers. It is fair to say that this impact did not occur until the early 1980s. The VAX and the 68000 were products of the middle and late 1970s. It is no great surprise then to find that CISC computer architectures were designed to enable and assist the assembly language programmer. Such an objective promotes the inclusion of many complex features which the programmer might want to utilize. However, two facts emerged in the late 1970s which suggested that this rich vocabulary was provided at too high a cost for its benefit (for a more complete discussion of the emergence of the RISC concept, which really began with Seymour Cray in the 1960s, see [Feldman and Retter, 1994]): ? It was widely observed that the generation, testing, and maintenance of large programs in assembly code was extremely expensive when compared to doing the same task in a HLL. Furthermore, the compilers were improving to the point where they competed quite well with typical assembly code. ? Although initially not widely accepted, it was observed that the rich vocabulary made it very difficult to set up an efficient processing pipeline for the instruction stream. In essence, the assembly line was forced to handle too many special cases and slowed down under the burden. When the analysis of compiled programs showed that only a limited span of instructions was being used, these prescient designers decided to include only the heavily used instructions and to restrict even these instructions so that they would flow in unblemished, uniform streams through the production line. Because this focus resulted in noticeably fewer instructions— though that was not the essential objective—the machines were called RISC, a sobriquet that came out of a VLSI design project at Berkeley [Patterson and Hennessy, 1989]. Even though RISC hardware designs have increased performance in essence by reducing the complexity and richness of the assembly language, back at the ranch the unrepentant assembly language programmer still desired complex features. Some of these features were included in the assembly languages not as native machine instructions but essentially as a high-level extension to assembly language. A universal extension is the inclusion of macros. In some sense, a macro looks like a subroutine, but instead of a call to a distant block of code, the macro results in an inline insertion of a predefined block of code. Formally, a macro is a name which identifies a particular sequence of assembly instructions. Then, wherever the name of the macro appears in the text, it is replaced by the lines of code in the macro definition. In some sense, macros make assembly language a little more like a HLL. It makes code more readable, makes code maintenance a little faster and more reliable (fix the macro definition and you fix all of the invocations of the macro), and it speeds up the programmer’s work. Another extension to some assembly languages is extended mnemonics. Here the coder places a mnemonic in place of specific assembly language instructions; during code assembly the mnemonic is automatically translated to an optimal and correct instruction or instruction sequence. These free the coder from the management of low-level details, leaving the task to a program where it is better suited. Examples of extended mnemonics include get and put, which generate memory transfers by selecting the addressing mode as well as the specific instructions based on the operand locations. An increasingly common feature of assembly languages is the inclusion of structured control statements which emulate high-level language control-flow constructs such as: if .. then .. else, for loops, while .. do loops, repeat .. until loops, break, and next. These features remove the tedium from the programmer’s task, allow for a more readable code, and reduce the cost of code development. An amusing set of examples are found in the assemblers that we have used on the SPARC. Architecture not withstanding, the assembly programmers wanted VAX assembly code! In spite of the absence of such constructs in the SPARC architecture, you find expressions such as CMP (compare) and MOV. Since these are easily translated to single lines of real SPARC code, their only raison d’etre ? 2000 by CRC Press LLC is to keep the old assembly language programmers happy. Presumably, those who knew not the VAX are not writing SPARC assembly code. Summary After all this fuss over compilers and how they generate assembly code, the obvious question is “Why bother to write any assembly code at all?” Three reasons why “some assembly may be required” follow. ? A human writing directly in assembly language is probably better than an optimizing compiler in extracting the last measure of resources (e.g., performance, hardware usage) from the machine. That inner loop—the code where the processor spends most of its execution cycles—may need to be hand- optimized to achieve acceptable performance. Real-time systems, where the expedient delivery of the data is as critical as its correctness, are another area where the required optimization may be greater than that achievable by compilers. The disparity in performance between human optimizers and their machine competitors comes from two special capabilities of the human programmer. These are the ability to know what the program will be doing—forward vision based on the program’s intent rather than its structure—and the ability to take advantage of special quirks or tricks that have no general applicability. If you really need to extract this last full measure of performance, assembly language is the route. The cost of doing such hand-optimization is much greater than the hours spent in doing it and getting it debugged. Special quirks and tricks expressible only in assembly language will not translate to another machine and may disappear even in an “upgrade” of the intended processor. ? There is overhead in using HLL conventions, some of which can be eliminated by directly coding in assembly language. A typical embedded processor does not need the full span of HLL conventions and support, such as parameter passing or memory and stack allocation. One can get away with such dangerous things as global variables which do not have to be passed at all. By eliminating these conven- tions, increased performance is obtained. It should be pointed out that code written without such standard conventions is likely to be very peculiar, bug-prone, and hard to maintain. ? HLLs provide only limited access to certain hardware features of the underlying machine. Assembly language may be required to access these features. Again, this makes the code unportable and hard to maintain, but small stubs of assembly code may be required to invoke hardware actions which have no representation in a HLL. For example, setting or clearing certain bits in a special register may not be expressible in C. While any address can be explicitly included in C code, how do you reference a register which has no address? An example of such usage is writing or reading into or out of the status register. Some machines map these transactions into special addresses so that C could be used to access them, but for the majority of machines which do not provide this route to the hardware, the only way to accomplish these actions is with assembly code. To this end, some C compilers provide an inline assembler. You can insert a few lines of assembly language right in the C code, get your datum into or out of the special register, and move right back to HLL. Those compilers which provide this nonstandard extension also provide a rational paradigm for using HLL variable names in the assembly statements. Where necessary, the name gets expanded to allow the variable to be fetched and then used. These reasons are special; they are not valid for most applications. Using assembly language loses development speed, loses portability, and increases the maintenance costs. While this caveat is well taken and widely accepted, at least for the present, few would deny the existence of situations where assembly language programming provides the best or only solution. Defining Terms Address error: An exception (error interrupt) caused by a program’s attempt to access unaligned words or longwords on a processor which does not accommodate such requests. The address error is detected within the CPU. This contrasts with problems which arise in accessing the memory itself, where a logic circuit external to the CPU itself must detect and signal the error to cause the CPU to process the exception. Such external problems are called bus errors. ? 2000 by CRC Press LLC Assembler: A computer program (application) for translating an assembly-code text file to an object file suitable for linking to become an executable image (application) in machine language. Some HLL com- pilers include an inline assembler, allowing the programmer to drop into and out of assembly language in the midst of a HLL program. CISC: Complex instruction set computer, a name to mean “not a RISC,” but generally one that offers a very rich vocabulary of computer operations at a cost of making the processor which must handle this variety of operations more complex, expensive, and often slower than a RISC designed for the same task. One of the benefits of a CISC is that the code tends to be very compact. When memory was an expensive commodity, this was a substantial benefit. Today, speed of execution rather than compactness of code is the dominant force. Compiler: A computer program (application) for translating a HLL text file to an object file suitable for linking to become an executable image (application) in machine language. Some compilers do both compilation and linking, so their output is an application. Condition codes: Many computers provide a mechanism for saving the characteristics of results of a particular calculation. Such characteristics as sign, zero result, carry or borrow, and overflow are typical of integer operations. The program may reference these flags to determine whether to branch or not. Disassembler: A computer program which can take an executable image and convert it back into assembly code. Such a reconstruction will be true to the machine language but normally loses much of the convenience factors, such as macros and name equivalencies, that an original assembly language program may contain. Executable image: A program in pure machine code and including all of the necessary header information that allows an operating system to load it and start running it. Since it can be run directly, it is executable. Since it represents the original HLL or assembly program it is an image. Flags: See Condition codes. High-level language (HLL): A computer programming language generally designed to be efficient and suc- cinct in expressing human programming concepts and paradigms. To be contrasted with low-level programming languages such as assembly language. Linker: A computer program which takes one or more object files, assembles them into blocks which are to fit in particular blocks in memory, and resolves all external (and possibly internal) references to other segments of a program and to libraries of precompiled subroutines. The output of the linker is a single file called an executable image which has all addresses and references resolved and which the operating system can load and run on request. Macro: A single line of code-like text, defined by the programmer, which the assembler will then recognize and which will result in an inline insertion of a predefined block of code. In most cases, the assembler allows both hidden and visible local variables and local labels to be used within a macro. Macros also appear in some HLLs, such as C (the define paradigm). Object code: A file comprising an intermediate description of a segment of a program. The object file contains binary data, machine language for program, tables of offsets with respect to the beginning of the segment for each label in the segment, and data that would be of use to debugger programs. RISC: Reduced instruction set computer, a name coined by Patterson et al. at the University of California at Berkeley to describe a computer with an instruction set designed for maximum execution speed on a particular class of computer programs. Such designs are characterized by requiring separate instructions for load/store operations and arithmetic operations on data in registers. The earliest computers explicitly designed by these rules were designs by Seymour Cray at CDC in the 1960s. The earliest development of the RISC philosophy of design was given by John Cocke in the late 1970s at IBM. See CISC above for the contrasting approach. Related Topic 87.2 High-Level Languages References A.V. Aho, R. Sethi, and J.D. Ullman, Compiler Principles, Techniques and Tools, Reading, Mass.: Addison-Wesley, 1986. A detailed text on the principles of compiler operations and tools to help you write a compiler. This text is good for those wishing to explore the intricacies of compiler operations. ? 2000 by CRC Press LLC D. Cohen, “On holy wars and a plea for peace,” Computer, pp. 11–17, Sept. 1981. A delightful article on the comparisons and motivations of byte ordering in memory. J. Feldman and C. Retter, Computer Architecture: A Designer’s Text Based on a Generic RISC, New York: McGraw- Hill, 1994. D. Patterson and J. Hennessy, Computer Architecture, A Quantitative Approach, San Mateo, Calif.: Morgan Kaufman, 1989. An excellent though rather sophisticated treatment of the subject. The appendices present a good summary of several seminal RISC designs. 87.2 High-Level Languages Ted G. Lewis High-level languages (HLLs), also known as higher-order lan- guages (HOLs), have a rich history in the annals of computing. From their inception in the 1950s until advances in the 1970s, HLLs were thought of as simple mechanical levers for producing machine-level instructions (see Table 87.3). Removing the details of the underlying machine, and automatically converting from a HLL statement to an equivalent machine-level statement, releases the programmer from the drudgery of the computer, allowing one to concentrate on the solution to the problem at hand. Over the years, HLLs evolved into a field of study of their own, finding useful applications in all areas of computing. Some HLLs are designed strictly for solving numerical problems, and some for symbolic problems. Other HLLs are designed to control the operation of the computer itself, and yet even more novel languages have been devised to describe the construction of computer hardware. The number of human-crafted languages has multiplied into the hundreds, leading to highly special-purpose HLLs. This evolution is best characterized as a shift away from the mechanical lever view of a HLL toward HLLs as notations for encoding abstractions. An abstraction is a model of the real world whose purpose is to de- emphasize mundane details and highlight the important parts of a problem, system, or idea. Modern HLLs are best suited to expressing such abstractions with little concern for the underlying computer hardware. Abstraction releases the HLL designer from the bounds of a physical machine. A HLL can adopt a metaphor or arbitrary model of the world. Such unfettered languages provide a new interface between human and computer, allowing the human to use the machine in novel and powerful ways. Abstractions rooted in logic, symbolic manipulation, database processing, or operating systems, instead of the instruction set of a central processing unit (CPU), open the engineering world to new horizons. Thus, the power of computers depends on the expressiveness of HLLs. To illustrate the paradigm shifts brought on by HLLs over the past 30 years, consider PROLOG, LISP, SQL, C++, and various operating system command languages. PROLOG is based on first-order logic. Instead of computing a numerical answer, PROLOG programs derive a conclusion. LISP is based on symbolic processing instead of numerical processing and is often used to symbolically solve problems in calculus, robotics, and artificial reasoning. SQL is a database language for manipulating large quantities of data without regard for whether it is numeric or symbolic. C++ is based on the object-oriented paradigm, a model of the world that is particularly powerful for engineering, scientific, and business problem solving. None of these modern languages bear much resemblance to the machines they run on. The idea of a mechanical lever has been pushed aside by the more powerful idea of language as world builder. The kinds of worlds that can be constructed, manipulated, and studied are limited only by the HLL designer’s formulation of the world as a paradigm. In this section we answer some fundamental questions about HLLs: What are they? What do we mean by “high level”? What constitutes a paradigm? What are the advantages and disadvantages of HLLs? Who uses HLLs? What problems can be solved with these languages? TABLE 87.3Each Statement of a HLL Translates into More than One Statement in a Machine-Level Language Such as Assembler Typical Number of Language Machine-Level Statements FORTRAN 4–8 COBOL 3–6 Pascal 5–8 APL 12–15 C 3–5 ? 2000 by CRC Press LLC What Is a HLL? At a rudimentary level, all languages, high and low, must obey a finite set of rules that specify both their syntax and semantics. Syntax specifies legal combinations of symbols that make up statements in the language. Semantics specifies the meanings attached to a syntactically correct statement in the language. To illustrate the difference between these two fundamental traits of all languages consider the statement, “The corners of the round table were sharp.” This is syntactically correct according to the rules of English grammar, but what does it mean? Round tables do not have sharp corners, so this is a meaningless statement. We say it is semantically incorrect. Statements of a language can be both syntactically and semantically correct and still be unsuitable for computer languages. For example, the phrase “. . . time flies . . .” has two meanings: one as an expression of clock speed, and the other as a reference to a species of insects. Therefore, we add one other requirement for computer languages: there must be only one meaning attached to each syntactically correct statement of the language. That is, the language must be unambiguous. This definition of a computer language does not separate a HLL from all other computer languages. To understand the features of HLLs that make them different from other computer languages, we must understand the concepts of mechanical translation and abstraction. Furthermore, to understand the differences among HLLs, we must know how abstractions are used to change the computing paradigm. But first, what is a HLL in terms of translation and abstraction? Defining the syntax of a HLL is easy. We simply write rules that define all legal combinations of the symbols used by the language. Thus, in FORTRAN, we know that arithmetic statements obey the rules of algebra, with some concessions to accommodate keyboards. A metalanguage is sometimes used as a kind of shorthand for defining the syntax of other languages, thus reducing the number of cases to be listed. Defining the semantics of a language is more difficult because there is no universally accepted metalanguage for expressing semantics. Instead, semantics is usually defined by another program that translates from the HLL into some machine-level language. In a way, the semantics of a certain HLL is defined by writing a program that unambiguously maps each statement of the HLL into an equivalent sequence of machine-level statements. For example, the FORTRAN statement below is converted into an equivalent machine-level sequence of statements as shown to the right: X = (B**2 – 4*A*C) PUSH B PUSH #2 POWER //B**2 PUSH #4 PUSH A PUSH C MULT //A*C MULT //4*(A*C) SUB //(B**2)–(4*(A*C)) POPX //X= In this example, we assume the presence of a pushdown stack (see Fig. 87.10). The PUSH and POP operations are machine-level instructions for loading/storing the top element of the stack. The POWER, MULT, and SUB instructions take their arguments from the top of the stack and return the results of exponentiation, multiplication, and sub- traction to the top of the stack. The symbolic expression of calculation in fortran becomes a sequence of low-level machine instructions which often bear little resemblance to the HLL program. The foregoing example illustrates the mechani- cal advantage provided by FORTRAN because one FORTRAN statement is implemented by many FIGURE 87.10(a) The stack after PUSH #4 and PUSH B; (b) the stack after POWER; (c) the stack after PUSH #4, PUSH A, and PUSH C; (d) the stack after MULT; and (e) the stack after MULT a second time. ? 2000 by CRC Press LLC machine-level statements. Furthermore, it is much easier for a human programmer to read and write FORTRAN than to read and write machine-level instructions. One major advantage of a HLL is the obvious improvement in program creation and, later on, its maintenance. As the size of the program increases, this advantage becomes larger as we consider the total cost to design, code, test, and enhance an application program. The FORTRAN program containing the example statement is treated like input data by the translating program which produces a machine-level sequence as output. In general, the input data is called the source program, and the resulting translated output is called the object program. There are two ways to obtain an object program from a source program: compiling and interpreting. In most cases, FORTRAN is translated by a compiler program. The idea behind a compiler is that the translator converts the source program in its entirety before any part of the resulting object program is actually run on the computer. That is, compiling is a two-step process. In some HLLs, however, it is impossible to entirely convert a source program into an object program until the program executes. Suppose the storage for A, B, and C in the previous example is not known at the time the program is compiled. We might want to allocate storage on-the-fly while the program is running, because we do not know in advance that the storage is needed. This is an example of delayed binding of a variable to its storage location in memory. Powerful languages such as Pascal and C permit a limited amount of delayed binding, as illustrated in the following example written in Pascal. This example also illustrates a limited amount of abstraction introduced by the HLL. type rnumber = real; {template} rptr = ^rnumber; {pointer} var Aptr, Bptr, Cptr : rptr; {instance} ... {later in the program...} new(Aptr); read( Aptr^); {binding} new(Bptr); read( Bptr^); new(Cptr); read( Cptr^); X := (Bptr^) * (Bptr^) – 4 * (Aptr^) * (Cptr^); The type statement is an abstraction that defines a template and access mechanism for the variables A, B, and C that are to be created on-the-fly. The var statement is similar to the DIMENSION statement in that it tells the translator to allocate space for three pointers: Aptr, Bptr, and Cptr. Each of these allocations will point to the actual values of A, B, and C according to the previous type statement. The actual allocation of storage is not known until the program executes the sequence of new() functions in the body of the program. Each new() function allocates space according to the type statement and returns a pointer to that space. To access the actual values in these storage spaces, the up arrow, ^, is written following the variable name. Thus, the read() function gets a number from the keyboard and puts it in the space pointed to by the pointer variable. Similarly, the value of X is computed by indirect reference to each value stored at the newly allocated memory location. The purpose of this example is to illustrate the use of delayed binding in a HLL. Languages such as LISP and C++ require even greater degrees of delayed binding because of the abstractions they support. When the amount of delayed binding becomes so great that very little of the program can be compiled, we say that the HLL is an interpreted language, and the translator becomes an interpreter rather than a compiler. This crossover is often obscure, so some HLLs are translated by both a compiler and an interpreter. BASIC is a classic example of a HLL that is both interpreted and compiled. The purpose of delayed binding is to increase the level of a HLL by introducing abstraction. Abstraction is the major differentiating feature between HLLs and other computer languages. Without abstraction and delayed binding, most HLLs would be no more powerful than a macro assembler language. However, with abstraction, HLLs permit a programmer to express ideas that transcend the boundaries of the physical machine. We can now define HLL based on the concept of abstraction. A HLL is a set of symbols which obey unambiguous syntactic and semantic rules: the syntactic rules specify legal combinations of symbols, and the semantic rules specify legal meanings of syntactically correct statements relative to a collection of abstractions. ? 2000 by CRC Press LLC The notion of abstraction is very important to understanding what a HLL is. The example above illustrates a simple abstraction, e.g., that of data structure abstraction, but other HLLs employ much more powerful abstraction mechanisms. In fact, the level of abstraction of a HLL defines how high a HLL is. But, how do we measure the level of a HLL? What constitutes a HLL’s height? How High Is a HLL? There have been many attempts to quantify the level of a programming language. The major obstacle has been to find a suitable measure of level. This is further complicated by the fact that nearly all computer languages contain some use of abstraction, and therefore nearly all languages have a “level.” Perhaps the most interesting approach comes from information theory. Suppose a certain HLL program uses P operators and Q operands to express a solution to some problem. For example, a four-function pocket calculator uses P = 4 operators for addition, subtraction, multiplication, and division. The same calculator might permit Q = 2 operands by saving one number in a temporary memory and the other in the display register. In a HLL the number of operators and operands might number in the hundreds or thousands. We can think of the set of P operators as a grab bag of symbols that a working programmer selects one at a time and places in a program. Suppose each symbol is selected with probability 1/P, so the information content of the entire set is Assuming the set is not depleted, the programmer repeats this process P times, until all of the operators have been selected and placed in the program. The information content contributed by the operators is P log(P), and if we repeat the process for selecting and placing all Q operands, we get Q log(Q) steps again. The sum of these two processes yields P log(P) + Q log(Q) symbols. This is known as Halstead’s metric for program length [Halstead, 1977]. Similar arguments can be made to derive the volume of a program, V, level of program abstraction, L, and level of the HLL, l, as follows. P= Number of distinct operators appearing in the program p= Total number of operators appearing in the program Q= Number of distinct operands appearing in the program q= Total number of operands appearing in the program N= Number of operators and operands appearing in the program V= Volume = N log 2 (P + Q) L= Level of abstraction used to write the program ? (2/P)*(Q/q) l= Level of the HLL used to write the program = L 2 V E= Mental effort to create the program = V/L Halstead’s theory has been applied to English (Moby Dick) and a number of programs written in both HLL and machine-level languages. A few results based on the values reported in Halstead [1977] are given in Table 87.4. This theory quantifies the level of a programming language: PL/I is higher level than Algol-58, but lower level than English. In terms of the mental effort required to write the same program in different languages, Table 87.4 suggests that a HLL is about twice as high level as assembler language. That is, the level of abstraction of PL/I is more than double that of assembler. This abstraction is used to reduce mental effort and solve the problem faster. - ? è ? ? ? ÷ = ? 11 1 PP P P log log() TABLE 87.4Comparison of Languages in Terms of Level, l, and Programming Effort, E Language Level, l Effort, E English 2.16 1.00 PL/I 1.53 2.00 Algol-58 1.21 3.19 FORTRAN 1.14 3.59 Assembler 0.88 6.02 ? 2000 by CRC Press LLC HLLs and Paradigms A programming paradigm is a way of viewing the world, e.g., an idealized model. HLLs depend on paradigms to guide their design and use. In fact, one might call HLL designers paradigm engineers because a good HLL starts with a strong model. Without such a model, the abstraction of a HLL is meaningless. In this section we examine the variety of paradigms embodied in a number of HLLs. The procedural paradigm was the earliest programming paradigm. It is the basis of COBOL, FORTRAN, Pascal, C, BASIC, and most early languages. In this paradigm the world is modeled by an algorithm. Thus, an electrical circuit’s behavior is modeled as a system of equations. The equations are solved for voltage, current, and so forth by writing an algorithmic procedure to numerically compute these quantities. In the procedural paradigm a large system is composed of modules which encapsulate procedures which in turn implement algorithms. Hierarchical decomposition of a large problem into a collection of subordinate problems results in a hierarchical program structure. Hence, a large FORTRAN or C program is typically composed of a collection of procedures (subroutines in FORTRAN and functions in C) organized in layers, forming a tree structure, much like the organization chart of a large corporation (see Fig. 87.11). Hierarchy is used in the procedural paradigm to encapsulate low-level algorithms, thus abstracting them away. That is, algorithm abstraction is the major contributor to leveling in a procedural HLL. Figure 87.11 illustrates this layering as a tree where each box is a procedure and subordinate boxes represent procedures called by parent boxes, the top-most box is the most abstract, and the lowest boxes in the tree are the most concrete. Intellectual leverage is limited to control flow encapsulation in most procedural languages. Only the execution paths through the program are hidden in lower levels. While this is an improvement over machine-level languages, it does not permit much flexibility. For example, algorithmic abstraction is not powerful enough to easily express non-numerical ideas. Thus, a C program is not able to easily model an electronic circuit as a diagram or object that can be reasoned about, symbolically. One of the reasons procedural HLLs fail to fully hide all details of an abstraction is that they typically have weak models of data. Data is allowed to flow across many boundaries, which leads to problems with encapsulation. In FORTRAN, BASIC, Pascal, and C, for example, access to any data is given freely through globals, parameter passing, and files. This is called coupling and can have disastrous implications if not carefully controlled. One way to reduce coupling in a procedural language is to eliminate side-effects caused by unruly access to data. Indeed, if procedures were prohibited from directly passing and accessing data altogether, many of the problems of procedural languages would go away. An alternative to the procedural paradigm is the functional paradigm. In this model of the world, everything is a function that returns a value. Data is totally abstracted away so that algorithms are totally encapsulated as a hierarchical collection of functions. LISP is the most popular example of a functional HLL [Winston and Horn, 1989]. A LISP statement that limits data access usually consists of a series of function calls; each function returns a single value which is used as an argument by another function and so on until the calculation is finished. For example, the FORTRAN statement X = (B**2 – 4*A*C) given earlier is written in functional form as follows: ASSIGN( X, MINUS(SQUARE(B), TIMES( 4, TIMES(A,C)))) This statement means to multiply A times C, then multiply the result returned by TIMES by 4, then subtract this from the result returned by SQUARE, and so forth. The final result is assigned to X. FIGURE 87.11Hierarchical decomposition of procedural program. ? 2000 by CRC Press LLC One of the most difficult concepts to adjust to when using the procedural paradigm is the idea that all things are functions. The most significant implication of this kind of thinking is the replacement of loops with recursion and branches with guards. Recall that everything is a function that must return a value—even control structures. To illustrate, consider the functional (non-LISP) equivalent of the summation loop in FORTRAN, below. S=0 DO 20 I=1,10 SUM( XList, N): S=S+X(I) N>0 | 20 CONTINUE N is N–1, SUM is Head( XList ) + SUM (TAIL(Xlist), N) The functional form will seem strange to a procedural programmer because it is higher level, e.g., more abstract. It hides more details and uses functional operators HEAD (for returning the first element of XList), TAIL (for returning the N–1 tail elements of XList), and is for binding a value to a name. Also, notice the disappearance of the loop. Recursion on SUM is used to run through the entire list, one element at a time. Finally, the guard N>0 prevents further recursion when N reaches zero. In the functional program, N is decremented each time SUM is recursively called. Suppose N = 10, initially; then SUM is called 10 times. When N > 0 is false, the SUM routine does nothing, thus terminating the recursion. Interestingly, the additions are not performed until the final attempt to recurse fails. That is, when N = 0, the following sums are collected as the nested calls unwind: SUM : XList(10) : SUM+Xlist(9) … … : SUM+XList(1) Functional HLLs are higher level than procedural languages because they reduce the number of symbols needed to encode a solution as a program. The problem with functional programs, however, is their high execution overhead caused by the delayed binding of their interpreters. This makes LISP and PROLOG, for example, excellent prototyping languages but expensive production languages. LISP has been confined to predominantly research use; few commercial products based on LISP have been successfully delivered without first rewriting them in a lower-level language such as C. Other functional languages such as PROLOG and STRAND88 have had only limited success as commercial languages. Another alternative is the declarative paradigm. Declarative languages such as Prolog and STRAND88 are both functional and declarative. In the declarative paradigm, solutions are obtained as a byproduct of meeting limitations imposed by constraints. Think of the solution to a problem as the only (first) solution that satisfies all constraints declared by the program. An example of the declarative paradigm is given by the simplified PROLOG program below for declaring an electrical circuit as a list of constraints. All of the constraints must be true for the circuit() constraint to be true. Thus, this program eliminates all possible R, L, C, V circuits from consideration, except the one displayed in Fig. 87.12. The declarations literally assert that Circuit(R, L, C, V) is a thing with “R connected to L, L connected to C, L connected to R, C connected to V, and V connected to R.” This eliminates “V connected to L,” for example, and leaves only the solution shown in Fig. 87.12. Circuit(R, L, C, V) : Connected(R, L) Connected(L, C) Connected(L, R) Connected(C, V) Connected(V, R) FIGURE 87.12Solution to declara- tion for Circuit(R, L, C, V). ? 2000 by CRC Press LLC One interesting feature of declarative languages is their ability to represent infinite calculations. A declaration might constrain a solution to be in an infinite series of numbers, but the series may not need to be fully computed to arrive at an answer. Another feature of such languages is their ability to compute an answer when in fact there may be many answers that meet all of the constraints. In many engineering problems, the first answer is as good as any other answer. The declarative paradigm is a very useful abstraction for unbounded problems. Adding abstraction to the functional paradigm elevates declarative languages even higher. Solutions in these languages are arrived at in the most abstract manner, leading to comparatively short, powerful programs. Perhaps the most common use of declarative languages is for construction of expert systems [Smith, 1988]. These kinds of applications are typically diagnostic. That is, they derive a conclusion based on assertions of fact. An electrical circuit board might be diagnosed with an expert system that takes symptoms of the ailing board as its input and derives a conclusion based on rules of electronic circuits—human rules of thumb given it by an experienced technician—and declarative reasoning. In this example, the rules are constraints expressed as declarations. The expert system program may derive more than one solution to the problem because many solutions may fit the constraints. Declarative languages have the same inefficiencies as functional languages. For this reason, expert system applications are usually developed in a specialized declarative system called an expert system shell. A shell extracts the declarative or constraint-based capability from functional languages such LISP and PROLOG to improve efficiency. Often it is possible to simplify the shell so that early binding is achieved, thus leading to compiling translators rather than interpreters. Very large and efficient expert systems have been developed for commercial use using this approach. Yet another paradigm used as the basis of modern languages such as C++ and Object Pascal is the object- oriented programming (OOP) paradigm [Budd, 1991]. OOP merges data and procedural abstractions into a single concept. In OOP, an object has both storage capacity and algorithmic functionality. These two abstractions are encapsulated in a construct called a class. One or more objects can be created by cloning the class. Thus, an object is defined as an instance of a class [Lewis, 1991]. OOP actually represents a culmination of ideas of procedural programming that have evolved over the past three to four decades. It is a gross oversimplification to say that OOP is procedural programming, because it is not, but consider the following evolution. Procedure = Algorithm + Data Structures Abstract Data Structure = Implementation Part + Interface Part Class = Abstract Data Structure + Functions Object = Class + Inheritance The first “equation” states that a procedure treats algorithms and data separately, but the programmer must understand both the data structure and the algorithms for manipulating the data structures of an application. This separation between algorithms and data is a key feature of the procedural paradigm. During the 1970s structured programming was introduced to control the complexity of the procedural paradigm. While only partially successful, structured programming limited procedures to less powerful control structures by elimi- nating the GOTO and programs with labels. However, structured programming did not go far enough. The next improvement in procedural programming came in the form of increased abstraction, called ADT (abstract data structures). An ADT separates the interface part of a procedure from its implementation part. Modula II and Ada? were designed to support ADTs in the form of modules and packages. The interface part is an abstraction that hides the details of the algorithm. Programming in this form of the procedural paradigm reduces complexity by elevating a programmer’s thoughts to a higher level of abstraction, but it still does not remove the problem of how procedures are related to one another. Classes group data together into clusters that contain all of the functions that are allowed to access and manipulate the data. The class concept is a powerful structuring concept because it isolates the data portion of a program, thus reducing coupling and change propagation. The class construct invented by the designers of Simula67 enforced the separation of interface and imple- mentation parts of a module, and in addition introduced a new concept. Inheritance is the ability to do what another module can do. Thus, inheritance relates modules by passing on the algorithmic portion of a module ? 2000 by CRC Press LLC to other modules. Inheritance in a programming language like SmallTalk, Object Pascal, and C++ means even greater abstraction because code can be reused without being understood. An object is an instance of a class. New objects inherit all of the functions defined on all of the classes used to define the parent of the object. This simple idea, copied from genetics, has a profound impact on both design and programming. It changes the way software is designed and constructed, i.e., it is a new paradigm. Object-oriented thinking greatly differs from procedural thinking (see Table 87.5). In OOP a problem is decomposed into objects which mimic the real world. For example, the objects in Fig. 87.12 are resistor, inductor, capacitor, and voltage source. These objects have real-world features (state) such as resistance, inductance, capacitance, and voltage. They also have behaviors defined by sinusoidal curves or phase shifts. In short, the real-world objects have both state and function. The state is represented in a program by storage and the function is represented by an algorithm. A resistor is a program module containing a variable to store the resistance and a function to model the behavior of the resistor when it is subjected to an input signal. The objects in an object-oriented world use inheritance to relate to one another. That is, objects of the same class all inherit the same functions. These functions are called methods in SmallTalk and member functions in C++[Ellis and Stroustrup, 1990]. However, the state or storage attributes of objects cloned from the same class differ. The storage components of an object are called instance variables in SmallTalk and member fields in C++. The wholism of combining data with instructions is known as ADTs; the concept of sending messages instead of calling procedures is the message-passing paradigm; the concept of interactive, nonlinear, and iterative development of a program is a consequence of an object’s interface specification being separated from its implementation part; the notion of modeling the real world as a network of interacting objects is called OOD (object-oriented design); the concept of specialization and reuse is known as inheritance; and OOP is the act of writing a program in an object-oriented language while adhering to an object-oriented view of the world. Perhaps an analogy will add a touch of concreteness to these vague concepts. Suppose automobiles were constructed using both technologies. Table 87.6 repeats the comparison of Table 87.5 using an automobile design and manufacturing analogy. We illustrate these ideas with a simple C++ example. The following code declares a class and two subclasses which inherit some properties of the class. The code also shows how interface and implementation parts are separated and how to override unwanted methods. Figure 87.13 depicts the inheritance and class hierarchy for this example. class Node{ public: //The interface part... Node() {} //Constructor function virtual ~Node() {} //Destructor function virtual int eval() { error(); return 0;} //Override this function } TABLE 87.5Procedural versus Object-Oriented Thinking Procedural Object-Oriented Instructions and data are separated. Objects consist of both data and instructions. Software design is linear, e.g., it progresses from design through coding and testing. This means change is difficult to accommodate. Software design is interactive with coding and testing. This means change is easier to accommodate. Programs are top-down decompositions of procedures, e.g., trees. Programs are networks of objects that send messages to one another without concern for tree structure. Program components are the real world, thus making programming more of a magic art. Program components have abstractions of corres-pondence with the real world, thus making programming more of a discipline. New programs are mostly custom built with little reuse from earlier programs. This leads to high construction costs and errors. New programs are mostly specializations of earlier programs through reuse of their components. This leads to low construction costs and higher quality. ? 2000 by CRC Press LLC The Node class consists of public functions that are to be overridden by descendants of the class. We know this because the functions are virtual, which in C++ means we expect to replace them later. Therefore we call this an abstract class. Also, Node() is the name of both the constructor and destructor member functions. A constructor is executed when a new object is created from the class, and the destructor is executed when the object is destroyed. These two functions take care of initialization and garbage collection which must be performed before and after dynamic binding of objects to memory space. Figure 87.13 shows this Node as an abstract class from which all other subclasses of this application are derived. Now, we create a subclass that inherits the properties (interface) of Node() and adds a new property, e.g., Binop. Binop is an abstraction of the binary operators of a pocket calculator, which is the real-world object being simulated by this example program. The expression to be calculated is stored in a binary tree, and Binop sprouts a new left and right subtree when it is created and deletes this subtree when it is disposed. TABLE 87.6Analogy with an Automobile Manufacturer Procedural Object-Oriented Vendors and Assemblers work from their own plans. There is little coordination between the two. Vendors and Assemblers follow the same blueprints; thus the resulting parts are guaranteed to fit together. Manufacturing and design are sequential processes. A change in the design causes everyone to wait while the change propagates from designers to workers on the production line. Design interacts with production. Prototypes are made and discarded. Production workers are asked to give suggestions for improvement or on how to reduce costs. Changes on the manufacturing floor are not easily reflected as improvements to the design on the drafting board. Changes in implementation rarely affect the design as interfaces are separated from implementation. Thus, the materials may change, but not the need for the parts themselves. New cars are designed and constructed from the ground up, much like the first car ever produced. New cars are evolutionary improvement to existing base technology, plus specializations that improve over last year’s model. FIGURE 87.13Partial class hierarchy for a C++ program that simulates a pocket calculator. The shaded Node class is an abstract class that all other classes use as a template. ? 2000 by CRC Press LLC class Binop : public Node { //Derive Binop from Node public: Node *left, *right; //Pointers to left and right subtrees ~Binop() { delete left; delete right;} //Collect garbage Binop( Node *lptr, Node *rptr) {left = lptr; right=rptr;} } Next, we define further specializations of Binop: one for addition of two numbers, Plus(), and the other for multiplication, Times(). The reader can see how to extend this to other operators. class Plus: public Binop { public: //Add member functions to Binop Plus( Node *lptr, Node *rptr) : Binop( lptr, rptr) {} //Use Binop int eval() { return left->eval()+right->eval();} //Do Addition }; class Times: public Binop { public: //Add member functions to Binop Times( Node *lptr, Node *rptr) : Binop( lptr, rptr) {} //Use Binop int eval() { return left->eval()*right->eval();} //Do Multiply }; In each case, the special-purpose operations defined in Plus() and Times() reuse Binop’s code to perform the pointer operations. Then they add a member function eval() to carry out the operation. This illustrates reuse and the value of inheritance. At this point, we have a collection of abstractions in the form of C++ classes. An object, however, is an instance of a class. Where are the objects in this example? We must dynamically create the required objects using the new function of C++. Node *ptr = new Plus ( lptr, rptr ); //Create an object and point to it int result = ptr->eval(); //Add delete ptr; The foregoing code instantiates an object that ptr points to, sends a message to the object telling it to perform the eval() function, and then disposes of the object. This example assumes that lptr and rptr have already been defined elsewhere. Clearly, the level of abstraction is greatly raised by OOP. Once a class hierarchy is established, the actual data processing is hidden or abstracted away. This is the power of the OOP paradigm. The pure object-oriented languages such as SmallTalk80 and CLOS have achieved only modest success due to their unique syntax and heavy demands on computing power. Hybrid HLLs such as Object Pascal and C++ have become widely accepted because they retain the familiar syntax of procedural languages, and they place fewer demands on hardware. Although OOP is an old technology (circa 1970), it began to gain widespread acceptance in the 1990s because of the growing power of workstations, the increased use of graphical user interfaces, and the invention of hybrid object-oriented languages such as C++. Typically, C++ adds 15–20% overhead to an application program due to delayed binding of objects to their methods. Given that the power of the hardware increases more than 20% per annum, this is an acceptable performance penalty. In addition, OOP is much more suitable for the design of graphical user-interface-intensive applications because the display objects correspond with programming objects, thus simplifying design and coding. Finally, if you know C, it is a small step to learn C++. Summary and Conclusions HLLs: What are they? What do we mean by “high level”? What constitutes a paradigm? What are the advantages and disadvantages of HLLs? Who uses HLLs? What problems can be solved with these languages? ? 2000 by CRC Press LLC HLLs are human inventions that allow humans to control and communicate with machines. They obey rules of syntax and unambiguous semantics which are combined to express abstract ideas. HLLs are called “high level” because they express abstractions. We have chosen to define the level of a HLL in terms of the information content of its syntax and semantics. The Halstead measure of language level essentially says that the higher a HLL is, the fewer symbols are needed to express an idea. Thus, if language A is higher than language B, a certain program can be expressed more succinctly in A than B. This is clearly the case when comparing HLLs with various machine-level languages, where a single statement in the HLL requires many statements in the machine-level language. HLLs differ from one another in the abstractions they support. Abstract views of the world are called paradigms, and the guiding principle of any HLL is its programming paradigm. We have compared the following programming paradigms: procedural, functional, declarative, and object- oriented. Procedural programming has the longest history because the first HLLs were based on low-level abstractions that are procedural. FORTRAN, COBOL, C, and Pascal are classical examples of the procedural languages. Functional and declarative languages employ higher levels of abstraction by restricting the world view to simple mechanisms: mathematical functions and constraints. It may seem odd that such restrictions increase the level of abstraction, but languages like LISP and PROLOG hide much of the detail found to be necessary in the procedural paradigm. This increases the measure of level defined in this section. Object-oriented programming embraces a novel abstraction that seems to fit the world of computing: objects. In this paradigm, the world is modeled as a collection of objects that communicate by sending messages to one another. The objects are related to each other through an inheritance mechanism that passes on the algorithmic behavior from one class of objects to another class. Inheritance permits reuse and thus raises the programming abstraction to a level above previous paradigms. The future of HLLs is uncertain and unpredictable. It is unlikely that anyone in 1970 would have predicted the acceptance of functional, declarative, or object-oriented paradigms in the 1990s. Therefore, it is unlikely that the following predictions bear much relationship to computing in the year 2000. However, it is instructive to project a few scenarios and explain their power. Functional and declarative programming result in software that can be mathematically analyzed, thus leading to greater assurances that the software actually works. Currently these paradigms consume too much memory and machine cycles. However, in the year 2000, very high-speed machines will be commonplace. What will we use these powerful machines for? One answer is that we will no longer be concerned with the execution efficiency of a HLL. The drawbacks of functional and declarative languages will fade, to be replaced by concern for the correctness and expressiveness of the HLL. If this occurs, functional and declarative languages will be the preferred HLLs because of the elevated abstractions supported by the functional and declarative paradigms. Applications constructed from these HLLs will exhibit more sophisticated logic, communicate in non-numeric languages such as speech and graphics, and solve problems that are beyond the reach of current HLLs. Object-orientation is an appealing idea whose time has come. OOP will be to the 1990s what structured programming was to the 1970s. Computer hardware is becoming increasingly distributed and remote. Networks of workstations routinely solve problems in concert rather than as stand-alone systems. This places greater demands on flexibility and functionality of applications. Consider the next step beyond objects—servers: Server = Object + Process A server is an object that is instantiated as an operating system process or task. The server sends messages to other servers to get work done. The servers “live” on any processor located anywhere on the network. Software is distributed and so is the work. OOP offers the greatest hope for distributing applications in this fashion without loss of control. Should this scenario come true, the OOP paradigm will not only be appropriate, but contribute to greater HLL leverage through reusable objects, distributed servers, and delayed binding of methods to these servers. Object-oriented languages, databases, and operating systems are on the immediate horizon. Graphical user- interface servers such as X-Windows already exist, lending credibility to this scenario. At least for the near future, HLLs are most likely to become identical with the object-oriented paradigm. ? 2000 by CRC Press LLC Defining Terms Abstract class: A class consisting of only an interface specification. The implementation part is unspecified, because the purpose of an abstract class is to establish an interface. Abstraction: Abstraction in computer languages is a measure of the amount of separation between the hardware and an expression of a programming idea. The level of abstraction of a high-level language defines the level of that language. ADT: An abstract data type (ADT) is a software module that encapsulates data and functions allowed to be performed on that data. ADTs also separate the interface specification of a module from the implemen- tation part to minimize coupling among modules. Assembler: A computer program for translating symbolic machine instructions into numerical machine instructions. Assemblers are considered low-level languages for programming a computer. Class: A specification for one or more objects that defines state (data) and functions (algorithms) that all objects may inherit when created from the class. A class is a template for implementing objects. Compiler: A computer program that translates the source program statements of a high-level language into lower-leveled object program statements. Compilers differ from interpreters in that they do not imme- diately perform the operations specified in the source program. Instead, a compiler produces an object program that in turn performs the intended operations when it is run. Coupling: A measure of the amount of interaction between modules in a computer program. High coupling means that a change in one module is likely to affect another module. Low coupling means there is little impact on other modules whenever a change is made in one module. Declarative paradigm: A programming paradigm in which the world is modeled as a collection of rules and constraints. Delayed binding: The process of postponing the meaning of a programming object until the object is manipulated by the computer. Delayed binding is used by interpreters and compilers, but more often it is employed by interpreters. Functional paradigm: A programming paradigm in which the world is modeled as a collection of mathe- matical functions. HLL (also HOL): A HLL is a set of symbols which obey unambiguous syntactic and semantic rules: the syntactic rules specify legal combinations of symbols, and the semantic rules specify legal meanings of syntactically correct statements relative to a collection of abstractions. Implementation part: The definition or algorithm for a programming module which gives the details of how the module works. Instance variables: Data encapsulated by an object. Interface specification: The definition of a programming module without any indication of how the module works. Interpreter: A computer program that translates and performs the intended operations of the source statements of a high-level language program. Interpreters differ from compilers in that they immediately perform the intended operations specified in the source program, and they do not produce an object program. Member fields: Instance variables of a C++ object. Member functions: Methods defined on a C++ object. Metalanguage: A formal language for defining other languages. A metalanguage is typically used to define the syntax of a high-level language. Methods: Functions allowed to be performed on the data of an object. Metric: A measure of a computer program’s complexity, clarity, length, difficulty, etc. Object: An instance of a class. Objects have state (data) and function (algorithms) that are allowed to manipulate the data. Object-oriented paradigm: A programming paradigm in which the world is modeled as a collection of self- contained objects that interact by sending messages. Objects are modules that contain data and all functions that are allowed to be performed on the encapsulated data. In addition, objects are related to one another through an inheritance hierarchy. Object program: Machine form of a computer program, which is the output from a translator. ? 2000 by CRC Press LLC Paradigm:An idealized model, typically used as a conceptual basis for software design. Programming para- digms dictate the approach taken by a programmer to organize, and then write, a computer program. Procedural paradigm: A programming paradigm in which the world is modeled as a collection of procedures which in turn encapsulate algorithms. Prototyping:A simplified version of a software system is a prototype. Prototyping is the process of designing a computer program through a series of versions; each version becomes a closer approximation to the final one. Pushdown stack:A data structure containing a list of elements which are restricted to insertions and deletions at one end of the list, only. Insertion is called a push operation and deletion is called a pull operation. Recursion:A procedure is called recursive if it calls itself. Reuse:Programming modules are reused when they are copied from one application program and used in another. Reusability is a property of module design that permits reuse. Semantics:The part of a formal definition of a language that specifies the meanings attached to a syntactically correct statement in the language. Source program: Symbolic form of a computer program, which is the input to a translator. Syntax :The part of a formal definition of a language that specifies legal combinations of symbols that make up statements in the language. Related Topic 87.1 Assembly Language References T. Budd, Object-Oriented Programming, Reading, Mass.: Addison-Wesley, 1991. M.A. Ellis and B. Stroustrup, The Annotated C++ Reference Manual, Reading, Mass.: Addison-Wesley, 1990. M.H. Halstead, Elements of Software Science, New York: Elsevier North-Holland, 1977. T.G. Lewis, CASE: Computer-Aided Software Engineering, New York: Van Nostrand Reinhold, 1991. P. Smith, Expert System Development in Prolog and Turbo-Prolog, New York: Halsted Press, 1988. P.H. Winston and B.K.P. Horn, LISP, Reading, Mass.: Addison-Wesley, 1989. 87.3 Data Types and Data Structures Johannes J. Martin The study of data types and data structures is a part of the discipline of computer programming. The terms refer to the two aspects of data objects: their usage and their implementation, respectively. The study of data types deals with the identification of (abstract) data objects in the context of a programming project and with methods of their more or less formal specification; the study of data structures, on the other hand, is concerned with the implementation of such objects using already existing data objects as raw material. Concretely, the area addresses a basic problem of programming: the reduction of complex objects, such as vectors, tensors, text, graphic images, sound, functions, directories, maps, corporate organizations, models of ecosystems or machinery, or anything else that a program may have to deal with, to the only native objects of digital computers: arrays of binary digits (bits). The fundamental problem of this reduction is managing program complexity. Two organizational tools are essential to its solution: abstraction and hierarchical struc- turing. Abstraction refers to the separation of what computational objects are used for from how they are reduced to (i.e., implemented by means of) simpler ones. Hierarchical structuring refers to breaking this reduction up into small manageable steps. Through several steps of abstraction more and more complex objects are constructed, each one reduced to the previous, simpler generation of objects. This process ends when the desired objects have been composed. Abstract Data Types An abstract data type is one or more sets of computational objects together with some basic operations on those objects. One of these sets is defined by the type either by enumeration or by generating operations and ? 2000 by CRC Press LLC is called the carrier set of the type. Customarily it is given the same name as the type. All other sets are called auxiliary sets of the type. In exceptional cases a type may have more than one carrier set. The heart of the specification of an abstract data type is the definition of its functions, their syntax and semantics. Their syntax is specified by their functionalities and their semantics by algebraic axioms. [For more details see, e.g., Martin, 1986]. With sets A and B, the expression A ? B denotes the set of all functions that have the domain A and the codomain B. Functions f ? A ? B (traditionally denoted by f: A ? B) are said to have the functionality A ? B. The collection of basic operations does not need to be minimal. It should, however, be rich enough so that all other operations that one might wish to perform on the objects of the carrier set can be expressed exclusively by these basic operations. The type Boolean, for example, consists of the set of Boolean values, Boolean = {true, false}, with, e.g., the operations not, and, and or. In general, things are not quite this simple. To be useful for programming purposes, even the type Boolean requires at least one additional function. This function, called a conditional expression, provides a choice of one of two given values depending on a given Boolean value. It has the form: f: Boolean 3 SomeType 3 SomeType ? SomeType and, with a, b ? SomeType, is defined by: f (true, a, b) = a and f (false, a, b) = b The syntactical form of conditional expressions varies for different programming languages that provide this construct. For example, in the language C it has the form: Boolean ? SomeType : SomeType. /* with the result type of SomeType */ The set SomeType is an auxiliary set of the type Boolean. Fundamental Data Types The fundamental types listed next are supported by almost all modern high-level programming languages (reference books on Pascal, Modula II, C, and Ada are listed among the references at the end of this section): Integer, Real (sometimes called Float), Character, and Boolean Since their carrier sets are ordered (one of the operations of these types is £), these types are also called scalar types. All provide operations for comparing values; in addition, Integer and Real come equipped with the usual arithmetic operations (+, –, *, /) and Boolean with the basic logical operations (not, and, or). Most computers support these operations by hardware instructions. Thus, while bit arrays are the original native objects of a digital computer, the fundamental scalar types may be viewed as the given elementary building blocks for the construction of all other types. Type Constructors Enumerated Types Beginning with Pascal, modern languages provide a rather useful constructor for scalar types, called enumerated types. Enumerated types have finite (small) carrier sets that the programmer defines by enumerating the constants of the type (specified as identifiers). For example, if the type Boolean were not available in Pascal, its carrier set could simply be defined by: type Boolean = (false, true) In Pascal, enumerated types are automatically equipped with operations for comparison as well as with the functions succ and pred, i.e., successor and predecessor. In the above example, succ(false) = true, pred(true) = false; succ(true) and pred(false) are not allowed. ? 2000 by CRC Press LLC Records Values of scalar types can be arranged into tuples by a construct called a record (also called a structure in some languages). Into programming languages, records introduce the equivalent of the Cartesian product. Tuples can be viewed as abstract data types. Consider pairs as an example: The type Pairs: Carrier set: Pairs, Auxiliary sets: A, B; Operations: pair ? A 3 B ? Pairs; first ? Pairs ? A; scnd ? Pairs ? B; where " a ? A, b ? B first (pair (a, b)) = a; scnd (pair(a, b)) = b; Using Pascal notation this type is defined by: type Pairs = record first: A; scnd: B end By providing the so-called field names, first and scnd, the programmer implicitly chooses the names for the selector functions. Pascal does not provide the function “pair.” Instead, it permits the declaration of variables of type Pairs whose component values may be set (by assignment) and retrieved: p: Pairs; {declaration of p as a variable of type Pairs} p.first := a; p.scnd := b; {p now has the value of “pair(a,b)” above} if p.first = x then . . . else . . . {p.first is Pascals notation for “first(p)” } The sets A and B can be of any type including records. Furthermore, since there is no restriction on the number of fields records may have, they can represent arbitrary tuples. Arrays Arrays permit the construction of simple function spaces, I ? A. The domain I, a scalar type—in some languages restricted to a subset {0, 1, ..., n} of the integers—is called the index set of the array; A is an arbitrary type. In Pascal, the mathematical notation f ? I ? A assumes the form: f: array[I] of A I has to be a finite scalar type (e.g., 0 .. 40). The function f can now be defined by associating values of type A with the values of the domain I using the assignment operation: f[i] := a; where i ? I and a ? A Application of f to a value j ? I is expressed by f[j]. This expression has a value of type A and can be used as such. As with records, Pascal allows the naming of the function space I ? A by the definition: type myFunctions = array[I] of A and the subsequent declaration of a specific function (array) by: f : myFunctions Functions of several arguments are represented by so-called multidimensional arrays: f ? I 1 3 . . . 3 I n ? A is defined by f : array[I 1 , . . ., I n ] of A ? 2000 by CRC Press LLC Variant Records Variant records model disjoint (also called tagged) unions. In contrast to an ordinary union C = A è B, a disjoint union D = A + B is formed by tagging the elements of A and B before forming D such that elements of D can be recognized as elements of A or of B. In programming, this amounts to creating variables that can be used to house values of both type A and type B. A tag field, (usually) part of the variable, is used to keep track of the type of the value currently stored in the variable. In Pascal, D = A + B is expressed by: type tagType=(inA, inB) {an enumerated type}; D=record case kind: tagType of inA: (aValue: A); inB: (bValue:B); end. Variables of type D are now used as follows: mix: D; {mix is declared to be of type D} mix.kind := inA; mix.aValue := a; . . . if mix.kind = inA then {do something with mix.aValue, which is of type A} else {do something with mix.bValue, which is of type B} Conceptually, only one of the two fields, mix.aValue or mix.bValue, exists at any one time. The proper use of the tag is policed in some languages (e.g., Ada) and left to the programmer in others (e.g., Pascal). An Example of a User-Defined Abstract Data Type Most carrier sets are assumed to contain a distinguished value: error. Error is not a proper computational object: a function is considered to compute error if it does not return to the point of its invocation due to some error condition. Functions are called strict if they compute the value error whenever one or more of their arguments have the value error. The following example models a cafeteria tray stack with the following operations: 1.create a new stack with n trays; 2. remove a tray from a stack; 3. add a tray to the stack; 4. check if the stack is empty; Specification: Cts (cafeteria tray stacks) is the carrier set of the type; Boolean and Integer are auxiliary sets; newStack, remove, add, isEmpty are the operations of the type where newStack ? Integer ? Cts; {create a stack of n trays} remove, add ? Cts ? Cts; isEmpty ? Cts ? Boolean; Axioms (logical expressions that describe the semantics of the operations): All functions are strict and, for all non-negative values of n, 1.remove (newStack(n)) = if n = 0 then error else newStack(n – 1); 2.add(newStack(n)) = newStack(n + 1); 3.isEmpty (newStack(n))= (n = 0). ? 2000 by CRC Press LLC These axioms suffice to describe the desired behavior of Cts exactly, i.e., using these axioms, arbitrary expressions built with the above operations can be reduced to error or newStack(m) for some m. Implementation: For the representation of Cts (i.e. its data structure) we will choose the type Integer. type Cts = integer; function newStack(n: Integer):Integer; begin if n < 0 then error (‘n must be >= 0’) else newStack := n end; function remove (s: Cts):Cts; begin if s = 0 then error (‘stack empty’) else remove := s - 1 end; function add (s: Cts):Cts; begin add := s + 1 end; function isEmpty (s: Cts):Boolean; begin isEmpty := (s = 0) end; Above, “error” is a function that prints an error message and stops the program. Dynamic Types The carrier sets of dynamic types contain objects of vastly different size. For these types, variables (memory space) must be allocated dynamically, i.e., at run time, when the actual sizes of objects are known. Examples for dynamic types are character strings, lists, tree structures, sets, and graphs. A classical example of a dynamic type is a special type of a list: a queue. As the name suggests, a queue object is a sequence of other objects with the particular restrictive property that objects are inspected at and removed from its front and added to its rear. Specification of Queues Carrier set: Queues Auxiliary sets: Boolean, A (A contains the items to be queued) Operations: newQueue, isEmpty, queue, pop, front 1. newQueue ? Queues; {a new, empty queue} 2. isEmpty ? Queues ? Boolean; {check if a queue is empty} 3. queue ? A2Queues ? Queues; {add an object to the rear of a queue} 4. front ? Queues ? A; {return front element for inspection} 5. pop ? Queues ? Queues; {remove front element from a queue} Axioms: All functions are strict and, for a ? A and s ? Queues, isEmpty (newQueue); (i.e. isEmpty (newQueue) is true) not isEmpty (queue(a, s)); (i.e. isEmpty (queue(a, s)) is false) pop (newQueue) = error; pop (queue(a, s)) = if s = newQueue then newQueue else queue(a, pop(s)); front(newQueue) = error; front(queue(a, s)) = if s = newQueue then a else front(s). Implementation of Queues The following implementation represents queues of the form queue(a, s) by the ordered pair (a, s) and a new, empty queue by the null pair nil. For the moment we assume that a data type, Pairs, which provides pairs on demand at run time, already exists and is defined as follows: Specification of Pairs Carrier Set: Pairs Auxiliary sets: Boolean, A ? 2000 by CRC Press LLC Operations: nil, isNil, pair, first, scnd 1. nil ? Pairs, {a distinguished pair, the null pair} 2. isNil ? Pairs ? Boolean, {test for the null pair} 3. pair ? A 3 Pairs ? Pairs, {combine an item and a pair to a new pair} 4. first ? Pairs ? A, {the first component, i.e., the item} 5. scnd ? Pairs ? Pairs, {the second component, i.e., the pair} Axioms: All functions are strict and, for a ? A and p ? Pairs, isNil(nil); (i.e. isNil(nil) is true) not isNil(pair(a,p)); (i.e. isNil(pair(a,p)) is false) first (nil) = error; first(pair(a,p)) = a; scnd (nil) = error; scnd(pair(a,p)) = p; With pairs, queues may now be implemented as follows: type Queues = Pairs; function newQueue : Queues; begin newQueue := nil end; function isEmpty (s : Queues) : Boolean; begin isEmpty := (s = nil) end; function queue (x : A; s : Queues) : Queues; begin queue := pair(x, s) end; function pop (s : Queues) : Queues; begin if isNil(s) then error (‘cannot pop empty queue’) else if scnd(s) = nil then pop := nil else pop := pair (first(s), pop (scnd(s))) end; function front (s : Queues) : A; begin if isNil(s) then error (‘an empty queue does not have a front’) else if scnd(s) = nil then front := first(s) else front := front(scnd(s)) end; The logic of these programs echoes the axioms. Such implementations are sometimes not very efficient but useful for prototype programs, since the probability of their correctness is very high. The queues behave as values, i.e., the functions queue(a,s) and pop(s) do not modify the queues s but compute new queues; after the execution of, e.g., s1 := pop(s) there are two independent queue values, s and s1. This is exactly the behavior postulated by the axioms. However, practical applications frequently deal with mutable objects, objects that can be modified. With mutable objects memory may often be used more efficiently, since it is easier to decide when a memory cell, used, e.g., for storing an ordered pair, is no longer needed and thus may be recycled. If queues are viewed as mutable objects, the operations queue and pop are implemented as procedures that modify a queue. In order to apply the style of axioms introduced above for the description of mutable objects, these objects are best viewed as containers of values. The procedure queue(a, qobj), for example, takes the queue value, e.g., s, out of the container qobj, applies the function queue(a, s) (described by the axioms), and puts the result back into qobj. If more than one place in a program needs to maintain a reference to such an object, the object must be implemented using a head cell: a storage location that represents the object and that is not released. The following ? 2000 by CRC Press LLC implementation uses a head cell with two fields, one pointing to the front and one to the rear of the queue. We assume that the type Pairs has two additional functions: 6. pairH ? Pairs 3 Pairs ? Pairs; {create a pair with 2 pair fields} 7. firstH ? Pairs ? Pairs; {retrieve the first field of such a 2-pair cell} and three procedures: 8. setfirstH ( p: Pairs; q: Pairs); {change the firstH field of q to p} 9. setscnd ( p: Pairs; q: Pairs); {change the scnd field of q to p} 10. delete (s: Pairs) {free the storage space occupied by s} type Queues = Pairs; procedure newQueue(var q : Queues); begin q := pairH(nil, nil) end; function isEmpty (s : Queues) : Boolean; begin isEmpty := (firstH(s) = nil) end; procedure queue (x : A; s : Queues); var temp: Pairs; begin temp := pair(x, nil); if isNil(firstH(s)) then setfirstH(temp, s) else setscnd(temp, scnd(s)); setscnd (temp, s); end; function pop (s : Queues) : Queues; var temp : Pairs; begin if isNil(firstH(s)) then error (‘cannot pop empty queue’) else begin temp := firstH(s); setfirstH(scnd(temp), s); delete(temp) end end; function front (s : Queues) : A; begin if isNil(firstH(s)) then error (‘an empty queue does not have a front’) else front := first(firstH(s)) end; Compared to the value implementation given earlier, this implementation improves the performance of front and pop from O(n) to O(1). An algorithm has O(f(n)) (pronounced: order f(n) or proportional to f(n)) time performance if there exists a constant c, such that, for arbitrary values of the input size n, the time that the algorithm needs for its computation is t £ c · f(n). Most modern programming languages support the implementation of the type pairs (n-tuples) whose instances can be created dynamically. It requires two operations, called new and dispose in Pascal, that dynam- ically allocate and deallocate variables, respectively. These operations depend on the concept of a reference (or pointer), which serves as a name for a variable. References always occupy the same storage space independently of the type of variable they refer to. The following implementation of Pairs explains the concept. type CellKind = (headCell, bodyCell); Pairs = ^PairCell; {Pairs are references to PairCells} PairCell = record tail: Pairs; case kind: CellKind of headCell: (frnt: Pairs); bodyCell: (val: A) end; function pair(item: A; next:Pairs):Pairs; var p: Pairs; ? 2000 by CRC Press LLC begin new(p, bodyCell); {a new “bodyCell” is created and accessible through p} p^.kind := bodyCell; p^.val := item; p^.tail := next; pair := p end; function first(p: Pairs):A; begin if p = nil then error(‘...’) else if p^.kind = bodyCell then first := p^.val else error(‘...’) end; procedure setfirstH(p, q:Pairs); begin if q = nil then error(‘...’) else if q^.kind = headCell then q^.frnt := p else error(‘...’) end; (Note: The Pascal constant nil denotes the null pointer, a reference to nothing.) function isNil(p: Pairs): Boolean; begin isNil := (p = nil) end; The reader should have no difficulty filling in the rest of the implementation of Pairs. Most of the algorithms on dynamic data structures have been developed in the 1960s; still, an excellent reference is Knuth [1973]. More Dynamic Data Types Stacks and Lists with a Point of Interest A queue is the type of (linear) list used to realize first-come-first-served behavior. In contrast, another linear type, the stack, realizes last-come-first-served behavior. Sometimes it is necessary to scan a list object without dismantling it. This is accomplished by giving the list a point of interest (see Fig. 87.14). This requires four additional operations: restart(l: List); {moves point of interest to beginning of list l} current(l:List):A; {returns object at point of interest} advance(l:List); {advances point of interest by one toward end of list l} endOfList(l : List): Boolean; {true, if end of list has been reached} The type can be extended further by allowing insertions and deletions at the point of interest. N-ary, Binary, and General Trees An n-ary tree is the smallest set containing the empty tree and all ordered n+1-tuples t = (a, t 1 , ..., t n ) where a is member of some auxiliary set and the t i are n-ary trees. The element a is called the root element or simply the root of t and the t i are called subtrees of t. Note that in this sense, a list is a unary tree. Binary trees used as searchtrees access finite ordered sets. A binary searchtree is a tree that accesses a set. A tree t accesses a set s if the root of t is some element a of s, and s 1 , called the left subtree of t, accesses the subset {x*x ? s and x < a} and s 2 , called the right subtree of t, accesses the subset {x*x ? s and x > a}. FIGURE 87.14A list implementation with a point of interest and access to front and rear. ? 2000 by CRC Press LLC If the left and right subtrees of the above definition are of similar size, then the time for finding an element in the set is proportional to log(n) where n is the cardinality of the set. Quaternary trees (usually called quad trees) and octonary trees (called oct trees) are used to access two- dimensionally and three-dimensionally organized data, respectively. As with lists, the implementation of n-ary trees is based on n+1-tuples. A minimal set of operations for binary trees includes: nilTree? Trees; {the empty tree, represented by nil} isNil ? Trees ? Boolean; {test if tree is empty} tree ? A 3 Trees 3 Trees ? Trees; {build tree from an item and subtrees} root ? Trees ? A; {retrieve root item of tree} left ? Trees ? Trees; {retrieve left subtree} right ? Trees ? Trees; {retrieve right subtree} A general tree is the smallest set containing all order pairs t = (a,s) where a is a member of some auxiliary set and s is a possibly empty list of general trees. The element a is called the root element or simply the root of t and the trees in s are called subtrees of t. Note that there is no empty general tree; the simplest tree has a root and an empty list of subtrees. General trees are useful for the representation of hierarchical organizations such as the table of contents of a book or the organization of a corporation. Functions, Sets, Relations, Graphs Functions with reasonably small domains can be represented by arrays, as described earlier. Similarly, sets formed from a reasonably small universal set, relations on small domains, and graphs with not too many vertices can be represented by their characteristic functions implemented as bit arrays. In fact, Pascal provides a type constructor for sets that are derived from small universal sets. Frequently domains are far too large for this approach. For example, the symbol table that a compiler of a programming language maintains is a function from the set of valid identifiers to some set of attributes. The set of valid identifiers is infinite, or, if some length limitation is imposed, finite but exceedingly large (there are nearly 300 billion identifiers of eight characters or less). For most of its domain a symbol table returns the default value new or not found. It is therefore economical to store the mapping only for those domain values that map to a value different from the default value. A function of this sort is usually specified as follows: Specification of Functions: Carrier Set: Functions Auxiliary sets:Dom, Cod (domain and codomain) Operations: newFun, apply, update 1.newFun ? Functions, (returns default everywhere) 2.apply ? Functions 3 Dom ? Cod, 3.update ? Functions 3 Dom 3 Cod ? Functions; Axioms: All functions are strict and, for x,z ? Dom, y ? Cod and f ? Functions, apply(newFun, x) =default; apply (update(f,x,y), z)=if x = z then y else apply (f, z); An implementation based on these axioms amounts to representing a function as a list of those of its individual mappings that differ from default and leads to an O(1) performance for update and an O(n) performance of apply. Better is an implementation by binary searchtrees with a performance of O(log(n)) for both apply and update. Hashing The fastest method for the implementation of functions is hash coding or hashing. By means of a hash function, h ? Dom ? 0 .. k–1, the domain of the function is partitioned into k sections and each section is associated with an index. For each partition a simple list implementation is used and the lists are stored in an array A: ? 2000 by CRC Press LLC array[1 .. k–1]. In order to compute apply(f,x) or update(f,x,y), the list at A[hash(x)] is searched or updated. If the hash function has been properly chosen and if k and the number of function values different from default are of similar size, then the individual lists can be expected to be very short and independent of the number of nondefault entries of the function; thus performance for apply and update is O(1). The above discussion applies also to sets, relations, and graphs, since these objects can be represented by their characteristic functions. Object-Oriented Programming In languages that support object-oriented programming, classes (i.e., types) of objects are defined by specifying (1) the variables that each object will own as instance variables and (2) operations, called methods, applicable to the objects of the class. As a difference in style, these methods are not invoked like functions or procedures, but are sent to an object as a message. The expression [window moveTo : x : y] is an example of a message in the programming language Objective C, a dialect of C. Here the object window, which may represent a window on the screen, is instructed to apply to itself the method moveTo using the parameters x and y. New objects of a class are created—usually dynamically—by factory methods addressed to the class itself. These methods allocate the equivalent of a record whose fields are the instance variables of the object and return a reference to this record, which represents the new object. After its creation an object can receive messages from other objects. To data abstraction, object-oriented programming adds the concept of inheritance: from an existing class new (sub)classes can be derived by adding additional instance variables and/or methods. Each subclass inherits the instance variables and methods of its superclass. This encourages the use of existing code for new purposes. As an example, consider a class of a list objects. Each object has two instance variables pointing to the front and the rear of the list. In Objective C, the specification of the interface for this list class, i.e., the declaration of the instance variables and headers (functionalities) of the methods, has the following form: @interface MyLists : Object /* Object is the universal (system) class from which all classes are derived directly or indirectly */ { /* declaration of the instance variables; listRef front; listRef is the type of a pointer to a list assumed listRef rear; to be defined elsewhere */ } – initList; /* initializes instance variables with null pointers */ – (BOOL) isEmpty; /* test for empty list; note: parameter list is implied */ – add : (item) theThing; /* item is the type of things on the list */ – pop; – (item) front; @end As a companion of the interface file there is also an implementation file that contains the executable code for the methods of the class. A list with a point of interest can be defined as a subclass of MyList as follows: @interface ScanList : MyList /* ScanList is made a subclass of MyList */ { listRef pointOfInterest; } – restart; – (BOOL)endOfList; – advance; – (Item)current; @end If we also need a list that can add and delete at the point of interest, we define: @interface InsertionList : ScanList { } /* there are no new instance variables */ ? 2000 by CRC Press LLC – insert : (item) theThing; – shrink; /* removes item at the point of interest */ @end If a subclass defines a method already defined in the superclass, the new definition overrides the old one. Suppose we need a list where items are kept in ascending order: @interface SortedList : MyList { } – add : (item) theThing; /* this version of add inserts theThing at the proper place to keep the list sorted */ @end Defining Terms Abstract data type: One or more sets of computational objects together with some basic operations on those objects. One of these sets is defined by the type either by enumeration or by generating operations and is called the carrier set of the type. Customarily it is given the same name as the type. All other sets are called auxiliary sets of the type. In exceptional cases a type may have more than one carrier set. Binary searchtree: A tree that accesses a set. A tree t accesses a set s if the root of t is some element a of s, and s 1 , called the left subtree of t, accesses the subset {x * x ? s and x < a}, and s 2 , called the right subtree of t, accesses the subset {x * x ? s and x > a}. Functionality: With sets A and B, the expression A ? B denotes the set of all functions that have the domain A and the codomain B. Functions f ? A ? B (traditionally denoted by f: A ? B) are said to have the functionality A ? B. General tree: The smallest set containing all ordered pairs t = (a, s) where a is member of some auxiliary set and s is a possibly empty list of general trees. The element a is called the root element or simply the root of t and the trees in s are called subtrees of t. n-ary tree: The smallest set containing the empty tree and all ordered n+1-tuples t = (a, t 1 , . . ., t n ) where a is member of some auxiliary set and the t i are n-ary trees. The element a is called the root element or simply the root of t and the t i are called subtrees of t. O(f(n)) performance: An algorithm has O(f(n)) (pronounced: order f(n) or proportional to f(n)) time performance if there exists a constant c, such that, for arbitrary values of the input size n, the time that the algorithm needs for its computation is t £ c · f(n). Strictness: Most carrier sets are assumed to contain a distinguished value: error. Error is not a proper computational object: a function is considered to compute error if it does not return to the point of its invocation due to some error condition. Functions are called strict if they compute the value error whenever one or more of their arguments have the value error. Related Topic 90.3 Programming Methodology References A.V. Aho, Data Structures and Algorithms, Reading, Mass.: Addison-Wesley, 1988. T.H. Cormen, Introduction to Algorithms, New York: McGraw-Hill, 1995. K. Jensen and N. Wirth, Pascal: User Manual and Report, Berlin, Springer-Verlag, 1974. B.W. Kernighan and D.M. Ritchie, The C Programming Language, Englewood Cliffs, N.J.: Prentice-Hall, 1988. D.E. Knuth, The Art of Computer Programming, vol. 1, Reading, Mass.: Addison-Wesley, 1973, chap. 2. J.J. Martin, Data Types and Data Structures, C.A.R. Hoare, Series Ed., Englewood Cliffs, N.J.: Prentice-Hall International, 1986. NeXT Step Concepts, Chapter 3, Objective C, NeXT Developers’ Library, NeXT Inc. ? 2000 by CRC Press LLC S. Sahni, Fundamentals of Data Structures in C++, Computer Science Press, 1995. R. Sedgewick, Algorithms in C++, Reading, Mass.: Addison-Wesley, 1993. B. Stroustrup, The C++ Programming Language, Reading, Mass.: Addison-Wesley, 1991. United States Department of Defense, Reference Manual for the Ada? Programming Language, Washington D.C.: U.S. Government Printing Office, 1983. N. Wirth, Programming in Modula-2, Berlin: Springer-Verlag, 1983. Further Information There is a wealth of textbooks on data structures. Papers on special aspects of data types and their relation to programming languages are found regularly in periodicals such as ACM Transactions on Programming Languages and Systems, the Journal of the Association for Computing Machinery, IEEE Transactions on Computers, IEEE Transactions on Software Engineering, or Acta Informatica. ? 2000 by CRC Press LLC