COMPILER CONSTRUCTION
Principles and Practice
Kenneth C,Louden
8,Code Generation
8.1 Intermediate Code and Data
Structures for Code Generation
8.1.1 Three-Address Code
8.1.2 Data Structures for the
Implementation of Three-Address Code
8.1.3 P-Code
8.2 Basic Code Generation Techniques
8.2.1 Intermediate Code or Target
Code as a Synthesized Attribute
8.2.2 Practical Code Generation
8.2.3 Generation of Target Code from
Intermediate Code
Code generation from intermediate code involves
either or both of two standard techniques,
– Macro expansion and Static simulation
Macro expansion involves replacing each kind of
intermediate code instruction with an equivalent
sequence of target code instructions
Static simulation involves a straight-line
simulation of the effects of the intermediate code
and generating target code to match these effects
Consider the expression (x=x+3) +4,translate the P-code
into three-address code:
Lad x
Lod x
Ldc 3
Adi t1=x+3
Stn x=t1
Ldc 4
Adi t2=t1+4
We perform a static simulation of the P-machine stack to
find three-address equivalence for the given code
< -- top of sta c k
3
X
Addr e ss of x
T1= x + 3
< -- top of sta c k
T1
Addr ss of x
X= t1
< -- top of sta c k
T1
< -- top of sta c k
4
T1
T2= t1+4
< -- top of sta c k
T2
Now consider the case of translating from three-
address code to P-code,by simple macro
expansion,
A three-address instruction:
a = b + c
Can always be translated into the P-code sequence
lda a
lod b
lod c
adi
sto
Then,the three-address code for the expression (x=x+3)+4:
T1 = x + 3
X = t1
T2 = t1 + 4
Can be translated into the following P-code:
Lda t1
Lod x
Ldc 3
Adi
Sto
Lad x
Lod t1
Sto
Lda t2
Lod t1
Ldc 4
Adi
Sto
I f w e wa nt to e li mi na te the e x tra tempor a rie s,t he n a
more sophi sti c a ted sc he me than pure mac ro e x pa nsion
must b e used,
T 2 +
X,t 1 + 4
X 3
Contents
Part One
8.1 Intermediate Code and Data Structure for code Generation
8.2 Basic Code Generation Techniques
Part Two
8.3 Code Generation of Data Structure Reference
8.4 Code Generation of Control Statements and Logical Expression
8.5 Code Generation of Procedure and Function calls
Other Parts
8.6 Code Generation on Commercial Compilers,Two Case Studies
8.7 TM,A Simple Target Machine
8.8 A Code Generator for the TINY Language
8.9 A Survey of Code Optimization Techniques
8.10 Simple Optimizations for TINY Code Generator
8.3 Code Generation of Data Structure
References
8.3.1 Address Calculations
(1) Three-Address Code for Address Calculations
The usual arithmetic operations can be used to
compute addresses
Suppose wished to store the constant value 2 at
the address of the variable x plus 10 bytes
t1 = &x +10
*t1 = 2
The implementation of these new addressing
modes requires that the data structure for three-
address code contain a new field or fields
– For example,the quadruple data structure of Figure
8.4 (page 403) can be augmented by an
enumerated address-mode field with possible
values none,address,and indirect
( 2 ) P - C ode f or Add re s s C a lcula ti ons
I ntrodu c e n e w inst ruc ti o ns t o e x pr e ss new a ddre s sing modes,
1,ind (,indi re c t l oa d,)
stac k be for e stac k a ft e r
2,ix a (,index e d a ddre ss,)
stac k be for e stac k a fte r
lda x
ldc 1 0
ix a 1
ldc 2
s t o
a
I nd i
* ( a+ i )
8.3.2 Array References
The offset is computed from the subscript value as
follows:
– First,an adjustment must be made to the subscript
value if the subscript range does not begin at 0
– Second,the adjusted subscript value must be
multiplied by a scale factor that is equal to the size of
each array element in memory
– Finally,the resulting scaled subscript is added to the
base address to get the final address of the array
element.
The address of an array element a[t],
– b a s e _ a d d ress (a) + (t - lower_bound (a)) * element_size (a)
(1) Three-Address Code for Array References
Introduce two new operations:
One that fetches the value of an array element
t2= a[t1]
And one that assigns to the address of an array element
a[t2]= t1
For an example:
a[i+1] = a [j*2]+3
Translate into the three-address instructions
( with the symbols,=[],[]=)
t1 = j * 2
t2 = a [t1]
t3 = t2 + 3
t4 = i + 1
a [t4] = t3
Writing out the addresses computations of an
array element directly in the code,
The above example can be finally translated into:
t1 = j * 2
t2 = t1 * elem_size(a)
t3 = &a + t2
t4 = *t3
t5 = t4 + 3
t6 = i + 1
t7 = t6 * elem_size (a)
t8 = &a + t7
*t8 = t5
(2) P-Code for Array References
Use the new address instructions ind and ixa,The above example
a[i+1] = a [j*2]+3
Will finally become:
lda a
lod i
ldc 1
a d i
ixa elem_size(a)
lda a
lod j
ldc 2
m p i
ixa elem_size(a)
ind 0
ldc 3
a d I
sto
(3) A Code G e n e r at ion Pr o c e d u r e w ith Ar r ay Re f e r e n c e s
S how he re how a rr a y r e f e re nc e s c a n be g e n e ra t e d b y a c ode ge ne r a ti on pr o c e dure,
( a [ i + 1 ] = 2 ) + a [ j ]
T h e sy n ta x t ree o f th e a b o v e e x p ression,
Array reference generated by a code generation procedure.
( a [ i + 1 ] = 2 ) + a [ j ]
lda a
lod i
ldc 1
a d i
ixa elem_size(a)
ldc 2
s t n
lda a
lod j
ixa elem_size(a)
ind 0
adi
The code generation procedure for p-code:
Void gencode( syntaxtree t,int isaddr)
{char codestr[CODESIZE];
/*CODESIZE = max length of 1 line of p-code */
if (t != NULL)
{ switch(t->kind)
{ case OpKind:
switch (t->op)
{ case Plus:
if (is Addr) emitcode(“Error”);
else { genCode(t->lchild,FALSE);
genCode(t->rchild,FALSE);
emitcode(“adi”);}
break;
case Assign:
genCode(t->lchild,TRUE);
genCode(t->rchild,FALSE);
emitcode(“stn”);}
break;
case Subs:
sprintf(codestr,”%s %s”,”lda”,t->strval);
emitcode(codestr);
gencode(t->lchild,FALSE);
sprintf(codestr,”%s%s%s”,
“ixa elem_size(“,t->strval,”)”);
emitcode(codestr);
if (!isAddr) emitcode (“ind 0”);
break;
default:
emitcode(“Error”);
break;
}
break;
case ConstKind:
if (isAddr) emitcode(“Error”);
else
{ sprintf(codestr,”%s %s”,
”ldc”,t->strval);
emitCode(codestr);
}
break;
case IdKind:
if (isAddr)
sprintf(codestr,”%s %s”,”lda”,t->strval);
else
sprintf(codestr,”%s %s”,”lod”,t->strval);
emitcode(codestr);
break;
default:
emitCode(“Error”);
break;
}
}
}
(4) Multidimensional Arrays
For an example,in C an array of two dimensions
can be declared as:
Int a[15][10]
Partially subscripted,yielding an array of fewer
dimensions:
a[i]
Fully subscripted,yielding a value of the element
type of the array:
a[i][j]
The address computation can be implemented by
recursively applying the above techniques
8.3.3 Record Structure and Pointer
References
Computing the address of a record or structure
field presents a similar problem to that of
computing a subscripted array address
– First,the base address of the structure variable is
computed;
– Then,the (usually fixed) offset of the named field is
found,
– and the two are added to get the resulting address
For example,the C declarations:
Typedef struct rec
{ int i;
char c;
int j;
} Rec;

Rec x;
Memory
allocated
to x
Base address of x
Offset of x.c
Offset of x.j
(Other memory)
x.j
x.c
x.i
(Other memory)
1) Three-Address Code for Structure and Pointer References
Use the three-address instruction
t1 = &x + field_offset (x,j)
x.j = x.i;
be translated into
t1 = &x + field_offset (x,j)
t2 = &x + field_offset (x,i)
*t1 = *t2
Consider the following example of a tree data structure
and variable declaration in C:
typedef struct treeNode
{ int val;
struct treeNode * lchild,* rchild;
} TreeNode;
typedef struct treeNode
{ int val;
struct treeNode * lchild,* rchild;
} TreeNode;
.,,
TreeNode *p;
p -> lchild = p;
p = p -> rchild;
translate into the three-address code
t1 = p + field_offset ( *p,lchild )
*t1 = p
t2 = p + field_offset ( *p,rchild )
p = *t2
2) P-Code for Structure and Pointer References
x.j = x.i
translated into the P-code
lda x
lod field_offset (x,j)
ixa 1
lda x
ind field_offset (x,i)
sto
The assignments:
p->lchild = p;
p = p->rchild
Can be translated into the following P-code.
Lod p
Lod field-offset(*p,lchild)
Ixa 1
Lod p
Sto
Lda p
Lod p
Ind field_offset(*p,rchild)
sto
8.4 Code Generation of Control Statements
and Logical Expressions
The section will describe code generation for
various forms of control statements,
– Chief among these are the structured if-statement and
while-statement
Intermediate code generation for control
statements involves the generation of labels in
manner,
– Which stand for addresses in the target code to which
jumps are made
If labels are to be eliminated in the generation of
target code,
– The a problem arises in that jumps to code locations
that are not yet known must be back-patched,or
retroactively rewritten.
8.4.1 Code Generation for If – and While –
Statements
Two forms of the if- and while-statements:
– if-stmt → i f ( e x p ) stmt | i f ( exp ) stmt e l s e stmt
– while-stmt → w h i l e ( e x p ) s t m t
The chief problem is to translate the structured
control features into an,unstructured”
equivalent involving jumps
– Which can be directly implemented.
Compilers arrange to generate code for such
statements in a standard order that allows the
efficient use of a subset of the possible jumps that
target architecture might permit.
The typical code arrangement for an if-statement is shown as
follows,
While the typical code arrangement for a while-statement
Three-Address Code for Control
Statement
For the statement:
if ( E ) S1 e l s e S2
The following code pattern is generated:
<code to evaluate E to t1>
if_false t1 goto L1
<code for S1>
goto L2
label L1
<code for S 2>
label L2
Three-Address Code for Control
Statement
Similarly,a while-statement of the form
while ( E ) S
Would cause the following three-address code
pattern to be generated:
label L1
<code to evaluate E to t1>
if_false t1 goto L2
<code for S>
goto L1
label L2
P-Code for Control Statement
For the statement
if ( E ) S1 else S 2
The following P-code pattern is generated:
<code to evaluate E>
fjp L1
<code for S 1>
ujp L2
lab L1
<code for S 2>
lab L2
P-Code for Control Statement
And for the statement
while ( E ) S
The following P-code pattern is generated:
lab L1
<code to evaluate E>
fjp L2
<code for S>
ujp L1
lab L2
8.4.2 Generation of Labels and Back-
patching
One feature of code generation for control
statements that can cause problems during target
code generation is the fact that,in some cases,
jumps to a label must be generated prior to the
definition of the label itself
A standard method for generating such
forward jumps is either to leave a gap in the
code where the jump is to occur or to generate a
dummy jump instruction to a fake location
Then,when the actual jump location becomes
known,this location is used to fix up,or back-
patch,the missing code
During the back-patching process a further
problem may arise in that many
architectures have two varieties of jumps,
a short jump or branch ( within 128 bytes if
code) and a long jump that requires more
code space
In that case,a code generator may need to
insert nop instructions when shortening
jumps,or make several passes to condense
the code
8.4.3 Code Generation of Logical
Expressions
The standard way to do this is to represent the Boolean
value false as 0 and true as 1,
– Then standard bitwise and and or operators can be used to
compute the value of a Boolean expression on most architectures
A further use of jumps is necessary if the logical operations
are short circuit,For instance,it is common to write in C:
– if ((p!=NULL) && ( p->val==0) ),..
– Where evaluation of p->val when p is null could cause a memory
fault
Short-circuit Boolean operators are similar to if-statements,
except that they return values,and often they are defined
using if-expressions as
– a and b,,if a then b else false
– and
– a or b,,if a then true else b
To generate code that ensures that the second sub-
expression will be evaluated only when necessary
– Use jumps in exactly the same way as in the code for if-statements
For instance,short-circuit P-code for the C expression ( x !
= 0 ) & & ( y = = x ) is:
lod x
ldc 0
n e q
fjp L1
lod y
lod x
e q u
ujp L2
lab L1
lod FALSE
lab L2
8.4.4 A Sample code Generation Procedure
for If- and While- Statements
Exhibiting a code generation procedure for control
statements using the following simplified
grammar:
stmt → if-stmt | while-stmt | b r e a k | o t h e r
if-stmt → i f ( exp ) stmt | i f ( e x p ) stmt e l s e s t m t
while-stmt → w h i l e ( e x p ) s t m t
exp → t r u e | f a l s e
The following C declaration can be used to
implement an abstract syntax tree for this grammar:
typedef enum { ExpKind,IfKind,
WhileKind,BreakKind,OtherKind } NodeKind;
typedef struct streenode
{ NodeKind kind;
struct streenode * child[3] ;
int val; /* used with ExpKind */
} STreeNode;
typedef STreeNode * SyntaxTree;
I n thi s s y ntax tre e struc t ur e,a node c a n h a ve a s man y a s thr e e c hil dr e n,
a nd e x pr e ssi on node s ar e c onst a nts wi th value true or f a lse,
F or e x a mpl e,the state me nt
if ( tr u e ) w h il e ( tr u e ) if ( f alse) b r e ak e lse ot h e r
ha s the s y nt a x tre e
Using the given typedef’s and the corresponding
syntax tree structure,a code generation procedure that
generates P-code is given as follows:
Void genCode(SyntaxTree t,char* lable)
{ char codestr[CODESIZES];
char *lab1,*lab2;
if (t!=NULL) switch (t->kind)
{case ExpKind:
if (t->val==0) emitCode(“ldc false”);
else emitcode(“ldc true”);
break;
case IfKind:
genCode(t->child[0],label);
lab1 = genLable();
sprintf(codestr,”%s %s”,“fjp”,lab1);
emitcode(codestr);
gencode(t->child[1],label);
if (t->child[2]!=NULL)
{ lab2=genlable();
sprintf(codestr,”%s %s”,”ujp”,lab2);
emitcode(codestr);}
sprintf(codestr,”%s %s”,”lab”,lab1);
emitcode(codestr);
if (t->child[2]!=NULL)
{ gencode(t->child[2],lable);
sprintf(codestr,”%s %s”,”lab”,lab2);
emitcode(codestr);}
break;
case WhileKind;
lab1=genlab();
sprintf(codestr,”%s %s”,“lab”,lab1);
emitcode(codestr);
gencode(t->child[0],label);
lab2=genlabel();
sprintf(codestr,”%s %s”,“fjp”,lab2);
emitcode(codestr);
gencode(t->child[1],lab2);
sprintf(codestr,”%s %s”,“ujp”,lab1);
emitcode(codestr);
sprintf(codestr,”%s %s”,“lab”,lab2);
emitcode(codestr);
break;
case BreakKind:
sprintf(codestr,”%s %s”,“ujp”,label);
emitcode(codestr);
break;
case OtherKind:
emitcode(“other”);
break;
Default:
emitcode(“other”);
break;
}
}
For the statement,
if (true) while (true) if (false) break else other
The above procedure generates the code sequence
ldc true
fjp L1
lab L2
ldc true
fjp L3
ldc false
fjp L4
ujp L3
ujp L5
lab L4
Other
lab L5
ujp L2
lab L3
Lab L1
8.5 Code Generation of Procedure and
Function Calls
8.5.1 Intermediate Code for Procedures
and Functions
The requirements for intermediate code
representations of function calls may be described
in general terms as follows
First,there are actually two mechanisms that need
descriptions,
– function/procedure definition
– and function/procedure call
A definition creates a function name,
parameters,and code,but the function does not
execute at that point
A call creates values for the parameters and
performs a jump to the code of the function,
which then executes and returns
Intermediate code for a definition must include
– An instruction marking the beginning,or entry point,
of the code for the function,
– And an instruction marking the ending,or return point,
of the function
Entry instruction
<Code for the function body>
Return instruction
Similarly,a function call must have an instruction
– indicating the beginning of the computation of the
arguments and an actual call instruction that indicates
the point where the arguments have been constructed
– and the actual jump to the code of the function can take
place
Begin-argument-computation instruction
<Code to compute the arguments >
Call instruction
Three-Address Code for Procedures and
Functions
In three-address code,the entry instruction needs to give a
name to the procedure entry point,similar to the label
instruction; thus,it is a one-address instruction,which we
will call simply entry,Similarly,we will call the return
instruction return
For example,consider the C function definition.
int f ( int x,int y )
{ return x + y + 1; }
This will translate into the following three-address code:
entry f
t1 = x + y
t2 = t1 + 1
return t2
Three-Address Code for Procedures and
Functions
For example,suppose the function f has been
defined in C as in the previous example,
Then,the call
f ( 2+3,4)
Translates to the three-address code
begin_args
t1 = 2 + 3
arg t1
arg 4
call f
P-code for Procedures and functions
The entry instruction in P-code is ent,and the return
instruction is ret
int f ( int x,int y )
{ return x + y + 1; }
Thus the definition of the C function f translates into the P-
code
ent f
lod x
lod y
a d i
ldc 1
a d i
r e t
P-code for Procedures and functions
Our example of a call in C (the call f (2+3,4) to
the function f described previously) now translates
into the following P-code:
m s t
ldc 2
ldc 3
a d i
ldc 4
cup f
8.5.2 A Code Generation Procedure for
Function Definition and Call
The grammar we will use is the following:
program → decl-list exp
decl-list → decl-list decl | ε
decl → f n id ( param-list ) = e x p
param-list → p a ram - list,id | id
exp → exp + exp | call | num | id
call → id ( arg-list )
arg-list → a rg-list,exp | exp
An example of a program as defined by this
grammar is
fn f(x)=2+x
fn g(x,y)=f(x)+y
g ( 3,4 )
We do so using the following C declarations:
typedef enum
{PrgK,FnK,ParamK,PlusK,CallK,ConstK,IdK}
NodeKind ;
typedef struct streenode
{ NodeKind kind;
struct streenode *lchild,*rchild,* s i b l i n g ;
char * name; /* used with FnK,ParamK,Callk,IdK */
int val; /* used with ConstK */
} StreeNode;
typedef StreeNode * SyntaxTree;
Abstract syntax tree for the sample program,
fn f(x)=2+x
fn g(x,y)=f(x)+y
g ( 3,4 )
Given this syntax tree structure,a code generation
procedure that produces P-code is given in the following:
Void genCode( syntaxtree t)
{ char codestr[CODESIZE];
SyntaxTree p;
If (t!=NULL)
Switch (t->kind)
{ case PrgK:
p = t->lchild;
while (p!=NULL)
{ gencode(p);
p = p->slibing;}
gencode(t->rchild);
break;
case FnK:
sprintf(codestr,”%s %s”,”ent”,t->name);
emitcode(codestr);
gencode(t->rchild);
emitcode(“ret”);
break;
case ConstK:
sprintf(codestr,”%s %d”,”ldc”,t->val);
emitcode(codestr);
break;
case PlusK:
gencode(t->lchild);
gencode(t->rchild);
emitcode(“adi”);
break;
case IdK:
sprintf(codestr,”%s %s”,”lod”,t->name);
emitcode(codestr);
break;
case CallK:
emitCode(“mst”);
p = t->rchild;
while (p!=NULL)
{genCode(p);
p = p->sibling;}
sprintf(codestr,”%s %s”,”cup”,t->name);
emitcode(codestr);
break;
default:
emitcode(“Error”);
break;
}
}
Given the syntax tree in Figure 8.13,the generated the
code sequences:
Ent f
Ldc 2
Lod x
Adi
Ret
Ent g
Mst
Lod x
Cup f
Lod y
Adi
Ret
Mst
Ldc 3
Ldc 4
Cup g
End of Part Two
THANKS