Semantic analysis
Parsing only verifies that the program consists of tokens
arranged in a syntactically-valid combination,we now
move on to semantic analysis,where we delve deeper to
check whether they form a sensible set of instructions in
the programming language.
For a program to be semantically correct,all variables,
functions,classes,etc,are properly defined,expressions
and variables are used in ways that respect the type
system,access control isn’t violated,and so on.
Semantic analysis is the next-to-last phase of the front end
and the compiler’s last chance to weed out incorrect
programs,We need to ensure the program is well-formed
enough to continue on to the next phase where we
generate code.
semantic analysis and symbol table
A large part of semantic analysis consists of tracking
variable/function/type declarations,As we enter
each new identifier in our symbol table,we need
to record the type information of the declaration,
Then,as we continue parsing the rest of the
program,we make sure that the type of each
identifier and expression is respected in terms of
the operations being performed
Examples of the things we check in the
semantic analysis phase.
,
The type of the right-side expression of an assignment
statement should match the type of the left-side,and the
left-side needs to be a properly declared and assignable
identifier (i.e,not some sort of constant),
The parameters of a function should match the arguments
of a function call in both number and type,
The language may require that identifiers are unique,
disallowing a global variable and function of the same
name.
The operands to multiplication operation will need to be
of numeric type,perhaps even the exact same type
depending on the strictness of the language,
As we encounter identifiers in a program,we need to
determine if the identifier is accessible at that
point in the program,This is called scope checking
One additional issue in semantic analysis is dealing
with scopes.
,A scope is a section of program text enclosed by
basic program delimiters,e.g.,{} in C,or begin-
end in Pascal,Many languages allow nested
scopes that are scopes defined within other scopes,
The scope defined by the innermost such unit is
called the current scope,The scope defined by the
current scope and by any enclosing program units
are known as open scopes,Any other scope is a
closed scope.
Syntax-directed translation
refers to a method of compiler implementation where
the source language translation is completely
driven by the parser,In other words,the parsing
process and parse trees are used to direct semantic
analysis and the translation of the source program,
This can be a separate phase of a compiler or we
can augment our conventional grammar with
information to control the semantic analysis and
translation,Such grammars are called attribute
grammars.
attribute grammars.
We augment a grammar by associating attributes with
each grammar symbol that describes its properties,
An attribute has a name and an associated value— a
string,a number,a type,a memory location,an
assigned register,whatever information we need.
With each production in a grammar,we give semantic
rules or actions,which describe how to
compute the attribute values associated with each
grammar symbol in a production,The attribute
value for a parse node may depend on information
from its children nodes below or its siblings and
parent node above.
There are two types of attributes we might
encounter,synthesized or inherited
,Synthesized attributes are those attributes that are passed
up a parse tree,i.e.,the left-side attribute is computed
from the right-side attributes,Values for the attributes
of terminals are usually supplied by the lexical analyzer
and the synthesized ones are passed up from there
,Inherited attributes are those that are passed down a
parse tree,i.e.,the right-side attributes are derived from
the left-side attributes (or other right-side attributes),
These attributes are used for passing information about
the context to nodes further down the tree.
S- attribute grammars,Only have
synthesized attributes
L- attribute grammars,for every production
A→X 1X2… Xn,each attribute is a synthesized
attribute or is a Inherited attribute of Xj
( 1≤j≤n),and the Inherited attribute of Xj
( 1≤j≤n) depends on the Inherited attribute of
A or depends on the attributes of X1,X2,…,
Xj-1 in every semantics rule
We can implement syntax-directed translation in
either a top-down or a bottom-up parser.
Access to attributes in yacc
,The syntax $1,$2 is used to access the attribute of
the nth token on the right side of the production,
The global variable yylval is set by the scanner
and that value is saved with the token when placed
on the parse stack,When a rule is reduced,a new
state is placed on the stack,the default behavior is
to just copy the attribute of $1 for that new state,
this can be controlled by assigning to $$ in the
action for the rule.
intermediate representation
Most compilers translate the source first to some form of
intermediate representation and convert from there into
machine code,The intermediate representation is a
machine- and languageindependent version of the
original source code,Although converting the code
twice introduces another step and thus incurs loss in
compiler efficiency,use of an intermediate
representation provides advantages in increased
abstraction,cleaner separation between the front and
back ends,adds possibilities for re-targeting/cross-
compilation,and works well with many advanced
optimization techniques.
Intermediate representations are usually categorized
according to where they fall between a high-level
language like C,and machine code,IRs that are
close to a high-level language are called high-level
IRs,and IRs that are close to assembly are called
low-level IRs,For example,a high-level IR might
preserve things like array subscripts or field
accesses whereas a low-level IR converts those
into explicit addresses and offsets,
abstract syntax tree
You can think of a parse tree as an example of a high-level
intermediate representation,In fact,it is often possible to
reconstruct the actual source code from a parse tree and
the corresponding symbol table.
If we were to build a tree during the parsing phase,it could
form the basis of a syntax tree representation of the input
program,Typically,this is not quite the literal parse tree
recognized by the parser (intermediate nodes may be
collapsed,groupings units can be dispensed with,etc.),but
it is winnowed down to the sufficient structure to drive
the semantic processing and code generation,Such a tree
is usually referred to as an abstract syntax tree.
In the abstract syntax tree there are not leaf-nodes for
operators and keywords,only inter-nodes associated with
them.