1
第五章 LL(1)文法及其分析程序
5.1 预测分析 程序
5.2 LL(1)文法
FIRST和 FOLLOW集定义和计?

LL(1) 文法定义
LL(1)分析程序的生成
5.3 非 LL(1)文法的改造
2
自上而下分析算法要点:
.由根向下构造语法树
.构造最左推导
.推导出的终结符是否与当前输入符匹配
S aaab
A B
a A
S –> AB
A –> aA |?
B –> b | bB
aaab.
S
AB S –> AB
aAB A –> aA
aaAB A –> aA
aaaAB A –> aA
aaa? B A –>?
aaab B –> b
3
带回溯的 自上而下分析
S –> AB
A –> aA |?
B –> b | bB
a a a b b.
S
(1)?A..,S –> AB
(2)?aA..,A –> aA
(3)?aaA..,A –> aA
(4)?aaaA..,A –> aA
(5)?aaa? B..,A –>?
(6)?aaab B –> b
aaabb.
S
(1)?A..,S –> AB
(2)?aA..,A –> aA
(3)?aaA..,A –> aA
(4)?aaaA..,A –> aA
(5)?aaa? B A –>?
(6’)?aaa b B B –> bB
(7)?aaabb B –> b
4
预测分析程序 Predictive parser
无 回溯 的自顶向下分析程序特征 ——根据下一个输入符号为当前要处理的非终结符选择产生式要求 ——文法是 LL(1)的第一个 L 从左到右扫描输入串第二个 L 生成的是最左推导
1 向前看一个输入符号( lookahead)
预测分析程序的实现技术
1 递归下降子程序
2 表驱动分析程序
5
PL/0语言的 EBNF
〈 程序 〉 ∷= 〈 分程序 〉,
〈 分程序 〉 ∷=[ 〈 常量说明部分 〉 ][〈 变量说明部分 〉 ][〈 过程说明部分 〉 ]〈 语句 〉
〈 常量说明部分 〉 ∷=CONST 〈 常量定义部分 〉 {,〈 常量 定义 〉 };
〈 变量说明部分 〉 ∷=VAR 〈 标识符 〉 {,〈 标识符 〉 };
〈 过程说明部分 〉 ∷= PROCEDURE 〈 标识符 〉〈 分程序 〉
{; 〈 过程说明部分 〉 };
〈 语句 〉 ∷= 〈 标识符 〉,=〈 表达式 〉 |IF 〈 条件 〉
then〈 语句 〉 |CALL… |READ… |BEGIN 〈 语句 〉 {; 〈 语句 〉 } END|WHILE… |…
6
begin(*statement*)
if sym=ident
then (*parsing ass.st.*)
begin
getsym;
if sym=becomes
then getsym
else error(13);
expression(fsys);
end
else
if sym=readsym
then
(* parsing read st.*)
begin
getsym;
if sym<>lparen
then error(34)
else
repeat
getsym;
if sym <> ident
then error(35)
else getsym
until sym<>comma;
if sym<>rparen
then error(33);
end
7
递归下降子程序
program –> function_list
function_list –> function function_list |?
function –> FUNC identifier ( parameter_list )
statement
void ParseFunction()
{
MatchToken(T_FUNC);
ParseIdentifier();
MatchToken(T_LPAREN);
ParseParameterList();
MatchToken(T_RPAREN);
ParseStatement();
}
8
void MatchToken(int expected)
{
if (lookahead != expected) {
printf("syntax error \n");
exit(0);
} else // if match,consume token and move
on
lookahead = yylex();
}
9
例:递归子程序实现表达式的语法分析表达式的 EBNF
〈 表达式 〉 ∷=[+| -]〈 项 〉 {( +|-) 〈 项 〉 }
〈 项 〉 ∷= 〈 因子 〉 {( *|/) 〈 因子 〉 }
〈 因子 〉 ∷= ident|number|‘( ’ 〈 表达式 〉
‘) ’
10
procedure expr;
begin
if sym in [ plus,minus ] then
begin
getsym; term;
end
else term;
while sym in [plus,minus] do
begin
getsym; term;
end
end;
11
Procedure term;
begin factor;
while sym in [times,slash] do
begin getsym;factor end
end;
12
Procedure factor;begin if sym=ident
then getsymelse
if sym=numberthen getsym
elseif sym=‘(‘
then begin getsym;
expr;if sym=‘)’
then getsymelse error
endelse error
end;

13
表驱动 予测分析程序模型
Input
#
总控程序预测分析表
stack
14
带 a0 a1 a2 a3 a4 a5 a6 a7 a8 … an-1 an
有限控制器磁头识别程序的数学模型 下推自动机
15
上下文无关语言句型分析(识别)程序的数学模型下推自动机 Pda=(K,Σ,f,H,h0,S,Z)
H:下推栈符号的有穷字母表
h0,H中的初始符号
f,K?( Σ?{?})? H –> K? H*
Pda的一个组态是 K?Σ *? H 中的一个 ( k,w,?) k:当前状态,w:余留输入串,?:栈中符号,最左边的符号在栈顶。
Pda的一次移动用组态表示终止和接受的条件,
1.到达输入串结尾时,处在 Z中的一个状态或 2.某个动作序列导致栈空时
16
例,Pda P=({A,B,C),{a,b,c),f,{h,i},i
A,{ })
f(A,a,i) = (B,h) f(B,a,h) = (B,hh)
f(C,b,h) = (C,?) f(A,c,i) = (A,?)
f(B,c,h) = (C,h)
接受输入串 aacbb的过程
(A,aacbb,i) 读 a,pop i,push h,goto B
(B,acbb,h) 读 a,pop h,push hh,goto B
(B,cbb,hh) 读 c,pop h,push h,goto C
(C,bb,hh) 读 b,pop h,push?,goto C
(C,b,h) 读 b,pop h,push?,goto C
(C,?,? )
17
G[ E],(1) E –> TE’
(2) E’ –> +TE’
(3) E’ –>?
(4) T –> FT’
(5) T’ –> *FT’
(6) T’ –>?
(7) F –> (E)
(8) F –> a
18
a + * ( ) #
E (1) (1)
E’ (2) (3) (3)
T (4) (4)
T’ (6) (5) (6) (6)
F (8) (7)
G[ E],(1) E –> TE’ (2) E’ –> +TE’ (3) E’ –>?
(4) T –> FT’ (5) T’ –> *FT’ (6) T’ –>?
(7) F –> (E) (8) F –> a
19
分析算法
BEGIN
首先把 ’ #‘然后把文法开始符号推入栈;把第一个输入符号读进 b; FLAG,=TRUE;
WHILE FLAG DOBEGIN
把栈顶符号上托出去并放在X 中;
IF X? Vt THEN IF X=b THEN
把下一 个输入符号读进 a
ELSE ERRORELSE IF X=‘#’ THEN
IF X=b THEN FLAG:=FALSE
ELSE ERROR
ELSE IF?[X,b]={X –> X1X2..XK}
THEN 把 XK,X K-1,..,X1一一推进栈
ELSE ERROR
END OF WHILE;
STOP/*分析成功,过程完毕 */
END
20
分析输入串 #a+a#
栈内容 栈顶符号 当前输入 余留串 M[X,b ]
1 #E E a +a# E –> TE’
2 #E’T T a +a# T –> FT’
3 #E’T’F F a +a# F –> a
4 #E’T’a a a +a#
5 # E’T’ T’ + a# T’ –>?
6 #E’ E’ + a# E’ –> +TE’
7 #E’T+ + + a#
8 # E’T T a # T –> FT’
9 #E’T’F F a # F –> a
10 #E’T’a a a #
11 #E’T’ T’ # T’ –>?
12 #E’ E’ # E ’ –>?
13 # # #
21
FIRST集和 FOLLOW集的定义设 G=(VT,VN,P,S)是上下文无关文法
FIRST(?) ={a|? =>* a?,a∈ VT,?,?∈V *}
若? =>* ε 则规定 ε ∈FRIST (?)
FOLLOW( A) ={a?S =>*? A?且 a ∈ FRIST
(?),? ∈ V*,?∈ V+ }
若 S =>* u A?,且? =>* ε,则
#∈F OLLOW(A)
LL (1) 文法
22
计算 FIRST集
1.若 X?V?,则 FIRST(X)={X}
2.若 X?VN,且有产生式 X?a…,则把 a加入到
FIRST(X)中 ;若 X也是一条产生式,则把?也加到 FIRST(X)中,
3.若 X?Y…是一个产生式且 Y?VN,则把 FIRST(Y)
中的所有非?元 素都加到 FIRST(X)中 ;若 X?
Y1Y2…YK 是一个产生式,Y1,Y2,…,Y(i-1)都是非终结符,而且,对于任何 j,1≤ j ≤i -1,
FIRST(Yj)都含有? (即 Y1..Y(i-1) =>*? ),则把 FIRST(Yj)中的所有非?元素都加到 FIRST(X)
中 ;特别是,若所有的 FIRST(Yj,j=1,2,…,K)
均含有?,则把?加到 FRIST(X)中,
23
计算 FOLLOW集
1.对于文法的开始符号 S,置 #于 FOLLOW(S) 中 ;
2.若A? α B β 是一个产生式,则把
FIRST(β )\{?}加至 FOLLOW(B)中 ;
3.若A? α B是一个产生式,或A? α Bβ 是
一个产生式而 β =>*?(即FIRST(β )),
则把 FOLLOW( A)加至 FOLLOW( B)中.
24
一个文法 G是 LL( 1)的,当且仅当对于 G的每一个非终结符A的任何两个不同产生式A? α?β,
下面的条件成立:
1,FIRST( α ) ∩ FIRST(β )=?,也就是 α?和 β 推导不出以同一个终结符 a为首的符号串;它们不应该都能推出空字?.
2,假若 β =>*?,那么,
FIRST( α )∩FOLLOW ( A)=?,也就是,
若 β =>*?.则 α 所能推出的串的首符号不应在
FOLLOW(A)中.

25
G[ E],(1) E –> TE’ (2) E’ –> +TE’ (3) E’ –>?
(4) T –> FT’ (5) T’ –> *FT’ (6) T’ –>?
(7) F –> (E) (8) F –> a
·各非终结符的 FIRST集合如下:
FIRST(E)={ (,i}
FIRST(E′)= { +,ε }
FIRST(T)={ (,i}
FIRST(T′)= { *,ε }
FIRST(F)={ (,i}
·各非终结符的 FOLLOW集合为:
FOLLOW(E)={ ),#}
FOLLOW(E′)= { ),#}
FOLLOW(T)={+,),#}
FOLLOW(T′)= {+,),#}
FOLLOW(F)={ *,+,),#}
26
G[ E],(1) E –> TE’ (2) E’ –> +TE’ (3) E’ –>?
(4) T –> FT’ (5) T’ –> *FT’ (6) T’ –>?
(7) F –> (E) (8) F –> a
E’ –> +TE’ |? FIRST(+TE’)={ +}
FOLLOW(E′)= { ),#}
T’ –> *FT’ |? FIRST(*FT’)={ *}
FOLLOW(T′)= { +,),#}
F –> (E) | a FIRST( (E))={ (}
FIRST( a)={ a}
所以 G[E]是 LL( 1)的
27
予测分析表构造算法
1.对文法 G的每个产生式A执行第二步
和第三步;
2.对每个终结符 a?FIRST(?),把A加
至?[A,a]中,
3.若 FIRST(?),则对任何 b?FOLLOW(A)
把A加至?[A,b]中,
4.把所有无定义的?[A,a]标上,出错标志,。
可以证明,一个文法 G的予测分析表不含多重入口,当且仅当该文法是 LL(1)的
28
LL(1)文法的性质,
LL(1)文法是无二义的
LL(1)文法不含左递归非 LL(1)文法的改造消除左递归提左公因子将产生式Aβ | 变换为:
AB
B? β |?
29
E→E+T | T
T→T*F | F
F→i | (E)
FIRST(E)={ (,i}
FIRST(T)={ (,i}
FIRST(F)={ (,i}
消左递归
E –> TE’
E’ –> +TE’
E’ –>?
S → if C t S |
if C t S e S
C →b
提左因子
S → if C t S A
A→ e S |?
First集 Follow集
S if #,e
A e,? #,e
C b t
M[A,e]={A→ e S
A→? }
30
LL(1)分析中的一种错误处理办法发现错误
1栈顶的终结符与当前输入符不匹配
2非终结符 A于栈顶,面临的输入符为 a,但分析表 M的 M[A,a]为空
“应急”恢复策略跳过输入串中的一些符号直至遇到“同步符号”为止。
同步符号的选择
1把 FOLLOW(A)中的所有符号作为 A的同步符号。跳过输入串中的一些符号直至遇到这些“同步符号”,把 A从栈中弹出,可使分析继续
2把 FIRST(A)中的符号加到 A的同步符号集,当 FIRST(A)中的符号在输入中出现时,可根据 A恢复分析
31
review---parsing
The syntax analysis phase of a compiler verifies that the
sequence of tokens returned from the scanner
represent valid sentences in the grammar of the
programming language,
There are two major parsing approaches,top-down and
bottom-up,
In top-down parsing,you start with the start symbol and
apply the productions until you arrive at the desired
string,
In bottom-up parsing,you start with the string and
reduce it to the start symbol by applying the
productions backwards,
32
In the top-down parsing,we
begin with the start symbol
and at each step,expand one
of the remaining nonterminals
by replacing it with the right
side of one its productions.
We repeat until only terminals
remain,The top-down parse
prints a leftmost derivation of
the sentence.
A bottom-up parse works in
reverse,We begin with the
sentence of terminals and each
step applies a production in
reverse,replacing a substring
that matches the right side with
the nonterminal on the left,
We continue until we have
substituted our way back to the
start symbol,If you read from
the bottom to top,the bottom-up
parse prints out a rightmost
derivation of the sentence.
33
lookahead symbol The lookahead symbol is the next
symbol coming up in the input,
backtracking,Based on the information the parser
currently has about the input,a decision is made to
go with one particular production,If this choice leads
to a dead end,the parser would have to backtrack to
that decision point,moving backwards through the
input,and start again making a different choice and
so on until it either found the production that was the
appropriate one or ran out of choices.
34
predictive parser and LL(1)grammar
Predictive parser is a non-backtracking top-down
parser,A predictive parser is characterized by its
ability to choose the production to apply solely on the
basis of the next input symbol and the current
nonterminal being processed,
To enable this,the grammar must take a particular form,
We call such a grammar LL(1),The first,L” means
we scan the input from left to right; the second,L”
means we create a leftmost derivation; and the 1
means one input symbol of lookahead.
35
recursive-descent
The first technique for implementing a predictive parser
is called recursive-descent,
A recursive descent parser consists of several small
functions(procedures),one for each nonterminal in
the grammar,As we parse a sentence,we call the
functions (procedures) that correspond to the left side
nonterminal of the productions we are applying,If
these productions are recursive,we end up calling the
functions recursively.
36
Table-driven LL(1) parsing
In a recursive-descent parser,the production
information is embedded in the individual parse
functions for each nonterminal and the run-time
execution stack is keeping track of our progress
through the parse,There is another method for
implementing a predictive parser that uses a table
to store that production along with an explicit stack
to keep track of where we are in the parse.
37
How a table-driven predictive parser works
We push the start symbol on the stack and read the first
input token,As the parser works through the input,there
are the following possibilities for the top stack symbol X
and the input token nonterminal a:
1,If X = a and a = end of input (#),parser halts and parse
completed successfully
2,If X = a and a != #,successful match,pop X and advance
to next input token,This is called a match action.
3,If X != a and X is a nonterminal,pop X and consult table
at [X,a] to see which production applies,push right side of
production on stack,This is called a predict action.
4,If none of the preceding cases applies or the table entry
from step 3 is blank,there has been a parse error
38
The first set of a sequence of symbols u,written
as First(u ) is the set of terminals which
start all the sequences of symbols derivable from
u,A bit more formally,consider all
strings derivable from u by a leftmost derivation,
If u =>* v,where v begins with some
terminal,that terminal is in First(u),If u =>*?,
then? is in First(u ).
39
The follow set of a nonterminal A is the set of
terminal symbols that can appear immediately
to the right of A in a valid sentence,A bit more
formally,for every valid sentence S =>*uAv,
where v begins with some terminal,that
terminal is in Follow(A).
40
Computing first
To calculate First(u) where u has the form
X1X2...Xn,do the following:
1,If X1 is a terminal,then add X1 to First(u),
otherwise add First(X1) -? to First(u ),
2,If X1 is a nullable nonterminal,i.e.,X1 =>*?,
add First(X2) -? to First(u),Furthermore,
if X2 can also go to?,then add First(X3) -?
and so on,through all Xn until the first
nonnullable one.
3,If X1X2...Xn =>*?,add? to the first set.
41
Calculating follow sets,For each nonterminal in the
grammar,do the following:
1,Place# in Follow(S) where S is the start symbol and # is
the input's right endmarker.The endmarker might be end of
file,it might be newline,it might be a special symbol,
whatever is the expected end of input indication for this
grammar,We will typically use # as the endmarker.
2,For every production A –> uBv where u and v are any
string of grammar symbols and B is a nonterminal,
everything in First(v) except? is placed in Follow(B).
3,For every production A –> uB,or a production A –> u Bv
where First(v ) contains? (i.e,v is nullable),then
everything in Follow(A) is added to Follow(B).
42
Constructing the parse table
1,For each production A –> u of the grammar,do steps 2
and 3
2,For each terminal a in First(u),add A –> u to M[A,a]
3,If? in First(u),(i.e,A is nullable) add A –> u to
M[A,b] for each terminal b in Follow(A),
If? in First(u),and # is in Follow(A),add A –> u to
M[A,#]
4,All undefined entries are errors
43
LL(1) grammar
A grammar G is LL(1) iff whenever A –> u | v are two
distinct productions of G,the following conditions
hold:
- for no terminal a do both u and v derive strings
beginning with a (i.e,first sets are disjoint)
- at most one of u and v can derive the empty string
- if v =>*? then v does not derive any string beginning
with a terminal in Follow(A)