COMPILER CONSTRUCTION
Principles and Practice
Kenneth C,Louden
8,Code Generation
PART ONE
Generate executable code for a target machine that
is a faithful representation of the semantics of the
source code
Depends not only on the characteristics of the
source language but also on detailed information
about the target architecture,the structure of the
runtime environment,and the operating system
running on the target machine
Contents
Part One
8.1 Intermediate Code and Data Structure for code Generation
8.2 Basic Code Generation Techniques
Other Parts
8.3 Code Generation of Data Structure Reference
8.4 Code Generation of Control Statements and Logical Expression
8.5 Code Generation of Procedure and Function calls
8.6 Code Generation on Commercial Compilers,Two Case Studies
8.7 TM,A Simple Target Machine
8.8 A Code Generator for the TINY Language
8.9 A Survey of Code Optimization Techniques
8.10 Simple Optimizations for TINY Code Generator
8.1 Intermediate Code and Data
Structures for Code Generation
8.1.1 Three-Address Code
A data structure that represents the source program during
translation is called an intermediate representation,or
IR,for short
Such an intermediate representation that resembles target
code is called intermediate code
– Intermediate code is particularly useful when the goal of the
compiler is to produce extremely efficient code;
– Intermediate code can also be useful in making a compiler more
easily retarget-able.
Study two popular forms of intermediate code,Three -
Address code and P-code
The most basic instruction of three-address code is
designed to represent the evaluation of arithmetic
expressions and has the following general form:
X=y op z
2*a + (b - 3 ) w it h s y nt a x tre e
+
* -
2 a b 3
The c or r e spondi n g thre e - a ddre ss cod e is
T1 = 2 * a
T2 = b – 3
T3 = t1 + t2
Figure 8.1 Sample TINY program:
{ sample program
in TINY language -- computes factorial
}
read x ; { input an integer }
if 0 > x then { don?t compute if x <= 0 }
fact:=1;
repeat
fact:=fact*x;
x:=x-1
until x=0;
write fact { output factorial of x }
ends
The Three-address codes for above TINY program
read x
t1=x>0
if_false t1 goto L1
fact=1
label L2
t2=fact*x
fact=t2
t3=x-1
x=t3
t4= x= =0
if_false t4 goto L2
write fact
label L1
halt
8.1.2 Data Structures for the
Implementation of Three-Address Code
The most common implementation is to
implement three-address code as quadruple,which
means that four fields are necessary,
– One for the operation and three for the addresses
A different implementation of three-address code
is called a triple:
– Use the instructions themselves to represent the
temporaries,
It requires that each three-address instruction be
reference-able,either as an index in an array or as
a pointer in a linked list.
Quadruple implementation for the three-
address code of the previous example
(rd,x,_,_ )
(gt,x,0,t1 )
(if_f,t1,L1,_ )
(asn,1,fact,_ )
(lab,L2,_,_ )
(mul,fact,x,t2 )
(asn,t2,fact,_ )
(sub,x,1,t3 )
(asn,t3,x,_ )
(eq,x,0,t4 )
(if_f,t4,L2,_)
(wri,fact,_,_ )
(lab,L1,_,_ )
(halt,_,_,_ )
C code defining data structures for the quadruples
Typedef enum { rd,gt,if_f,asn,lab,mul,sub,eq,wri,halt,… }
OpKind;
Typedef enum { Empty,IntConst,String } AddrKind;
Typedef struct
{ AddrKind kind;
Union
{ int val;
char * name;
} contents;
} Address
Typedef struct
{ OpKind op;
Address addr1,addr2,addr3;
} Quad
A representation of the three-address code of
the previous example as triples
(0) (rd,x,_ )
(1) (gt,x,0)
(2) (if_f,(1),(11) )
(3) (asn,1,fact )
(4) (mul,fact,x)
(5) (asn,(4),fact )
(6) (sub,x,1 )
(7) (asn,(6),x)
(8) (eq,x,0 )
(9) (if_f,(8),(4))
(10) (wri,fact,_ )
(11) (halt,_,_)
8.1.3 P-Code
It was designed to be the actual code for a
hypothetical stack machine,called the P-
machine,for which an interpreter was
written on various actual machines
Since P-code was designed to be directly
executable,it contains an implicit
description of a particular runtime
environment,including data sizes,as well
as a great deal of information specific to the
P-machines.
The P-machine consists of a code memory,an
unspecified data memory for named variables,
and a stack for temporary data,together with
whatever registers are needed to maintain the
stack and support execution
2*a+(b-3)
ldc 2 ; load constant 2
lod a ; load value of variable a
mpi ; integer multiplication
lod b ; load value of variable b
ldc 3 ; load constant 3
sbi ; integer subtraction
adi ; integer addition
1) Comparison of P-Code to Three-Address Code
P-code is in many respects closer to actual
machine code than three-address code.
P-code instructions also require fewer addresses:
One the other hand,P-code is less compact than
three-address code in terms of numbers of
instructions,and P-code is not,self-contained” in
that the instructions operate implicitly on a stack.
2) Implementation of P-Code Historically,P-code
has largely been generated as a text file,but the
previous descriptions of internal data structure
implementations for three-address code will also
work with appropriate modification for P-code.
8.2 Basic Code Generation
Techniques
8.2.1 Intermediate Code or Target
Code as a Synthesized Attribute
Intermediate code generation can be viewed as an
attribute computation.
This code becomes a synthesized attribute that can
be defined using an attribute grammar and
generated either directly during parsing or by a
post-order traversal of the syntax tree,
For an example,a small subset of C expressions:
Exp? id = exp | aexp
Aexp? aexp + factor | factor
Factor?(exp) | num | id
Grammar Rule Semantic Rules
Exp1?id = exp2 exp1.pcode=”lda”|| id.strval++exp2.pcode++”stn”
Exp?aexp exp.pcode=aexp.pcode
Aexp1?aexp2 + factor aexp1.pcode=aexp2.pcode++factor.pcode++”adi”
Aexp?factor aexp.pcode=factor.pcode
Factor?(exp) factor.pcode=exp.pcode
Factor?num factor.pcode=”ldc”||num.strval
Factor? id factor.pcode=”lod”||id.strval
1) P-Code
The expression (x=x+3)+4 has the following P-Code attribute:
Lda x
Lod x
Ldc 3
Adi
Stn
Ldc 4
Adi
Grammar Rule Semantic Rules
Exp1?id = exp2 exp1.name=exp2.name
Exp1.tacode=exp2.tacode++ id.strval||”=”||exp2.name
Exp?aexp exp.name=aexp.name; exp.tacode=aexp.tacode
Aexp1?aexp2 + factor aexp1.name=newtemp( )
aexp1.tacode=aexp2.tacode++factor.tacode++
aexp1.name||”=”||aexp2.name||”+”||factor.name
Aexp?factor aexp.name=factor.name; aexp.tacode=factor.tacode
Factor?(exp) factor.name=exp.name; factor.tacode=exp.tacode
Factor?num factor.namenum.strval; factor.tacode=”,
Factor? id factor.namenum.strval; factor.tacode=”,
T1=x+3 x=t1 t2=t1+4
2) Three-Address Code
8.2.2 Practical Code
Generation
The basic algorithm can be described as the
following recursive procedure
Procedure gencode (T,treenode);
Begin
If T is not nil then
Generate code to prepare for code of left child of T;
Gencode(left child of T);
Generate code to prepare for code of right child of T;
Gencode(right child of T);
Generate code to implement the action of T;
End;
Typedef enum {plus,assign} optype;
Typedef enum {OpKind,ConstKind,IdKind} NodeKind;
Typedef struct streenode
{ NodeKind kind;
Optype op; /* used with OpKind */
Struct streenod *lchild,*rchild;
Int val; /* used with ConstKind */
Char * strval;
/* used for identifiers and numbers */
} streenode;
typedef streenode *syntaxtree;
E x pr e ssi on (x = x + 3) + 4
+
x= 4
+
x 3
A genCode procedure to generate P-code.
Void genCode (SyntaxTree t)
{ char codestr[CODESIZE];
/* CODESIZE = max length of 1 line of P-code */
if (t !=NULL)
{ switch (t->kind)
{ case opKind:
switch(t->op)
{ case Plus:
genCode(t->lchild);
genCode(t->rchild);
emitCode(“adi”);
break;
case Assign:
sprintf(codestr,“%s %s”,“lda”,t->strval);
emitcode(codestr);
genCode(t->lchild);
emitCode(“stn”);
break;
}
break;
case ConstKind:
sprintf(codestr,”%s %s”,”ldc”,t->strval);
emitCode(codestr);
break;
case IdKind:
sprintf(codestr,”%s %s”,”lod”,t->strval);
emitCode(codestr);
break;
default:
emitCode(“error”);
break;
}
}
}
Yacc specification for the generation P-code according to the
attribute grammar of Table 8.1
%{
#define YYSTYPE char *
/* make Yacc use strings as values */
/* other inclusion code … */
%}
%token NUM ID
%%
exp,ID
{ sprintf (codestr,“%s %s”,“lda”,$1);
emitCode ( codestr); }
=? exp
{ emitCode (“stn”); }
| aexp;
aexp,aexp?+? factor {emitCode(“adi”);}
| factor;
factor,?(? exp?)?
| NUM {sprintf(codestr,“%s %s”,”ldc”,$1);
emitCode(codestr);}
| ID {sprintf(codestr,”%s %s”,“lod”,$1);
emitCode(codestr);};
%%
/*utility functions… */
8.2.3 Generation of Target
Code from Intermediate Code
Code generation from intermediate code involves
either or both of two standard techniques:
– Macro expansion and Static simulation
Macro expansion involves replacing each kind of
intermediate code instruction with an equivalent
sequence of target code instructions,
Static simulation involves a straight-line
simulation of the effects of the intermediate code
and generating target code to match these effects.
Consider the expression (x=x+3) +4,translate the P-code
into three-address code:
Lad x
Lod x
Ldc 3
Adi t1=x+3
Stn x=t1
Ldc 4
Adi t2=t1+4
We perform a static simulation of the P-machine stack to
find three-address equivalence for the given code
< -- top of sta c k
3
X
Addr e ss of x
T1= x + 3
< -- top of sta c k
T1
Addr ss of x
X= t1
< -- top of sta c k
T1
< -- top of sta c k
4
T1
T2= t1+4
< -- top of sta c k
T2
We now consider the case of translating from
three-address code to P-code,by simple macro
expansion,A three-address instruction:
a = b + c
Can always be translated into the P-code sequence
lda a
lod b
lod c
adi
sto
Then,the three-address code for the expression (x=x+3)+4:
T1 = x + 3
X = t1
T2 = t1 + 4
Can be translated into the following P-code:
Lda t1
Lod x
Ldc 3
Adi
Sto
Lad x
Lod t1
Sto
Lda t2
Lod t1
Ldc 4
Adi
Sto
I f we w a nt to e li mi na te the e x tra tempor a rie s,t he n a
more sophi sti c a ted sc he me than pure mac ro e x pa nsion
must b e used,
T 2 +
X,t 1 + 4
X 3
End of Part One
THANKS
Principles and Practice
Kenneth C,Louden
8,Code Generation
PART ONE
Generate executable code for a target machine that
is a faithful representation of the semantics of the
source code
Depends not only on the characteristics of the
source language but also on detailed information
about the target architecture,the structure of the
runtime environment,and the operating system
running on the target machine
Contents
Part One
8.1 Intermediate Code and Data Structure for code Generation
8.2 Basic Code Generation Techniques
Other Parts
8.3 Code Generation of Data Structure Reference
8.4 Code Generation of Control Statements and Logical Expression
8.5 Code Generation of Procedure and Function calls
8.6 Code Generation on Commercial Compilers,Two Case Studies
8.7 TM,A Simple Target Machine
8.8 A Code Generator for the TINY Language
8.9 A Survey of Code Optimization Techniques
8.10 Simple Optimizations for TINY Code Generator
8.1 Intermediate Code and Data
Structures for Code Generation
8.1.1 Three-Address Code
A data structure that represents the source program during
translation is called an intermediate representation,or
IR,for short
Such an intermediate representation that resembles target
code is called intermediate code
– Intermediate code is particularly useful when the goal of the
compiler is to produce extremely efficient code;
– Intermediate code can also be useful in making a compiler more
easily retarget-able.
Study two popular forms of intermediate code,Three -
Address code and P-code
The most basic instruction of three-address code is
designed to represent the evaluation of arithmetic
expressions and has the following general form:
X=y op z
2*a + (b - 3 ) w it h s y nt a x tre e
+
* -
2 a b 3
The c or r e spondi n g thre e - a ddre ss cod e is
T1 = 2 * a
T2 = b – 3
T3 = t1 + t2
Figure 8.1 Sample TINY program:
{ sample program
in TINY language -- computes factorial
}
read x ; { input an integer }
if 0 > x then { don?t compute if x <= 0 }
fact:=1;
repeat
fact:=fact*x;
x:=x-1
until x=0;
write fact { output factorial of x }
ends
The Three-address codes for above TINY program
read x
t1=x>0
if_false t1 goto L1
fact=1
label L2
t2=fact*x
fact=t2
t3=x-1
x=t3
t4= x= =0
if_false t4 goto L2
write fact
label L1
halt
8.1.2 Data Structures for the
Implementation of Three-Address Code
The most common implementation is to
implement three-address code as quadruple,which
means that four fields are necessary,
– One for the operation and three for the addresses
A different implementation of three-address code
is called a triple:
– Use the instructions themselves to represent the
temporaries,
It requires that each three-address instruction be
reference-able,either as an index in an array or as
a pointer in a linked list.
Quadruple implementation for the three-
address code of the previous example
(rd,x,_,_ )
(gt,x,0,t1 )
(if_f,t1,L1,_ )
(asn,1,fact,_ )
(lab,L2,_,_ )
(mul,fact,x,t2 )
(asn,t2,fact,_ )
(sub,x,1,t3 )
(asn,t3,x,_ )
(eq,x,0,t4 )
(if_f,t4,L2,_)
(wri,fact,_,_ )
(lab,L1,_,_ )
(halt,_,_,_ )
C code defining data structures for the quadruples
Typedef enum { rd,gt,if_f,asn,lab,mul,sub,eq,wri,halt,… }
OpKind;
Typedef enum { Empty,IntConst,String } AddrKind;
Typedef struct
{ AddrKind kind;
Union
{ int val;
char * name;
} contents;
} Address
Typedef struct
{ OpKind op;
Address addr1,addr2,addr3;
} Quad
A representation of the three-address code of
the previous example as triples
(0) (rd,x,_ )
(1) (gt,x,0)
(2) (if_f,(1),(11) )
(3) (asn,1,fact )
(4) (mul,fact,x)
(5) (asn,(4),fact )
(6) (sub,x,1 )
(7) (asn,(6),x)
(8) (eq,x,0 )
(9) (if_f,(8),(4))
(10) (wri,fact,_ )
(11) (halt,_,_)
8.1.3 P-Code
It was designed to be the actual code for a
hypothetical stack machine,called the P-
machine,for which an interpreter was
written on various actual machines
Since P-code was designed to be directly
executable,it contains an implicit
description of a particular runtime
environment,including data sizes,as well
as a great deal of information specific to the
P-machines.
The P-machine consists of a code memory,an
unspecified data memory for named variables,
and a stack for temporary data,together with
whatever registers are needed to maintain the
stack and support execution
2*a+(b-3)
ldc 2 ; load constant 2
lod a ; load value of variable a
mpi ; integer multiplication
lod b ; load value of variable b
ldc 3 ; load constant 3
sbi ; integer subtraction
adi ; integer addition
1) Comparison of P-Code to Three-Address Code
P-code is in many respects closer to actual
machine code than three-address code.
P-code instructions also require fewer addresses:
One the other hand,P-code is less compact than
three-address code in terms of numbers of
instructions,and P-code is not,self-contained” in
that the instructions operate implicitly on a stack.
2) Implementation of P-Code Historically,P-code
has largely been generated as a text file,but the
previous descriptions of internal data structure
implementations for three-address code will also
work with appropriate modification for P-code.
8.2 Basic Code Generation
Techniques
8.2.1 Intermediate Code or Target
Code as a Synthesized Attribute
Intermediate code generation can be viewed as an
attribute computation.
This code becomes a synthesized attribute that can
be defined using an attribute grammar and
generated either directly during parsing or by a
post-order traversal of the syntax tree,
For an example,a small subset of C expressions:
Exp? id = exp | aexp
Aexp? aexp + factor | factor
Factor?(exp) | num | id
Grammar Rule Semantic Rules
Exp1?id = exp2 exp1.pcode=”lda”|| id.strval++exp2.pcode++”stn”
Exp?aexp exp.pcode=aexp.pcode
Aexp1?aexp2 + factor aexp1.pcode=aexp2.pcode++factor.pcode++”adi”
Aexp?factor aexp.pcode=factor.pcode
Factor?(exp) factor.pcode=exp.pcode
Factor?num factor.pcode=”ldc”||num.strval
Factor? id factor.pcode=”lod”||id.strval
1) P-Code
The expression (x=x+3)+4 has the following P-Code attribute:
Lda x
Lod x
Ldc 3
Adi
Stn
Ldc 4
Adi
Grammar Rule Semantic Rules
Exp1?id = exp2 exp1.name=exp2.name
Exp1.tacode=exp2.tacode++ id.strval||”=”||exp2.name
Exp?aexp exp.name=aexp.name; exp.tacode=aexp.tacode
Aexp1?aexp2 + factor aexp1.name=newtemp( )
aexp1.tacode=aexp2.tacode++factor.tacode++
aexp1.name||”=”||aexp2.name||”+”||factor.name
Aexp?factor aexp.name=factor.name; aexp.tacode=factor.tacode
Factor?(exp) factor.name=exp.name; factor.tacode=exp.tacode
Factor?num factor.namenum.strval; factor.tacode=”,
Factor? id factor.namenum.strval; factor.tacode=”,
T1=x+3 x=t1 t2=t1+4
2) Three-Address Code
8.2.2 Practical Code
Generation
The basic algorithm can be described as the
following recursive procedure
Procedure gencode (T,treenode);
Begin
If T is not nil then
Generate code to prepare for code of left child of T;
Gencode(left child of T);
Generate code to prepare for code of right child of T;
Gencode(right child of T);
Generate code to implement the action of T;
End;
Typedef enum {plus,assign} optype;
Typedef enum {OpKind,ConstKind,IdKind} NodeKind;
Typedef struct streenode
{ NodeKind kind;
Optype op; /* used with OpKind */
Struct streenod *lchild,*rchild;
Int val; /* used with ConstKind */
Char * strval;
/* used for identifiers and numbers */
} streenode;
typedef streenode *syntaxtree;
E x pr e ssi on (x = x + 3) + 4
+
x= 4
+
x 3
A genCode procedure to generate P-code.
Void genCode (SyntaxTree t)
{ char codestr[CODESIZE];
/* CODESIZE = max length of 1 line of P-code */
if (t !=NULL)
{ switch (t->kind)
{ case opKind:
switch(t->op)
{ case Plus:
genCode(t->lchild);
genCode(t->rchild);
emitCode(“adi”);
break;
case Assign:
sprintf(codestr,“%s %s”,“lda”,t->strval);
emitcode(codestr);
genCode(t->lchild);
emitCode(“stn”);
break;
}
break;
case ConstKind:
sprintf(codestr,”%s %s”,”ldc”,t->strval);
emitCode(codestr);
break;
case IdKind:
sprintf(codestr,”%s %s”,”lod”,t->strval);
emitCode(codestr);
break;
default:
emitCode(“error”);
break;
}
}
}
Yacc specification for the generation P-code according to the
attribute grammar of Table 8.1
%{
#define YYSTYPE char *
/* make Yacc use strings as values */
/* other inclusion code … */
%}
%token NUM ID
%%
exp,ID
{ sprintf (codestr,“%s %s”,“lda”,$1);
emitCode ( codestr); }
=? exp
{ emitCode (“stn”); }
| aexp;
aexp,aexp?+? factor {emitCode(“adi”);}
| factor;
factor,?(? exp?)?
| NUM {sprintf(codestr,“%s %s”,”ldc”,$1);
emitCode(codestr);}
| ID {sprintf(codestr,”%s %s”,“lod”,$1);
emitCode(codestr);};
%%
/*utility functions… */
8.2.3 Generation of Target
Code from Intermediate Code
Code generation from intermediate code involves
either or both of two standard techniques:
– Macro expansion and Static simulation
Macro expansion involves replacing each kind of
intermediate code instruction with an equivalent
sequence of target code instructions,
Static simulation involves a straight-line
simulation of the effects of the intermediate code
and generating target code to match these effects.
Consider the expression (x=x+3) +4,translate the P-code
into three-address code:
Lad x
Lod x
Ldc 3
Adi t1=x+3
Stn x=t1
Ldc 4
Adi t2=t1+4
We perform a static simulation of the P-machine stack to
find three-address equivalence for the given code
< -- top of sta c k
3
X
Addr e ss of x
T1= x + 3
< -- top of sta c k
T1
Addr ss of x
X= t1
< -- top of sta c k
T1
< -- top of sta c k
4
T1
T2= t1+4
< -- top of sta c k
T2
We now consider the case of translating from
three-address code to P-code,by simple macro
expansion,A three-address instruction:
a = b + c
Can always be translated into the P-code sequence
lda a
lod b
lod c
adi
sto
Then,the three-address code for the expression (x=x+3)+4:
T1 = x + 3
X = t1
T2 = t1 + 4
Can be translated into the following P-code:
Lda t1
Lod x
Ldc 3
Adi
Sto
Lad x
Lod t1
Sto
Lda t2
Lod t1
Ldc 4
Adi
Sto
I f we w a nt to e li mi na te the e x tra tempor a rie s,t he n a
more sophi sti c a ted sc he me than pure mac ro e x pa nsion
must b e used,
T 2 +
X,t 1 + 4
X 3
End of Part One
THANKS