1
清华大学 宋斌恒 1
Lecture 3. Dynamic
Programming
清华大学 宋斌恒
清华大学 宋斌恒 2
综合练习 : 如何查资料 、 整理资料
n 资料来源
n 资料整理
n 资料使用
清华大学 宋斌恒 3
Dynamic programming
n Longest common subsequence
n Printing neatly
n Edit distance
n Optimal binary search tree
n 技巧 : Memoization[备忘录法 ]
清华大学 宋斌恒 4
动态规划方法回顾
n 与分治方法有没有共同点 ? 【 不同点 】
n 子问题空间 ? 决定复杂性因子之一
n 子问题的治与合 ? 复杂性因子之二
n 最优值
n 最优解
n [讨论 ]
清华大学 宋斌恒 5
最长公共子序列
清华大学 宋斌恒 6
问题定义
n Given :
n A finite alphabet A, finite set R of strings from A* , and
apositiveinteger K
n Is There :
n A string w element of A* with | w | >= K such that w
is a subsequence of each x element of R ?
n The problem remains NP-complete for all A ( R ) >= 2
n Analogous Longest Common Substring problem is
trivially solvable in polynomial time
2
清华大学 宋斌恒 7
实例
n Given as input, the strings :
n DEABC
n ABCD
n FGHABD
n Common Subsequences Are :
n A, B, D, AB
n The Longest Common Subsequences Are :
n AB
清华大学 宋斌恒 8
一些应用
n ?String - to - String Correction (spell checking
& correction)
n ?Distance Between Strings (database keys)
n ?Minimum Cost Edit Operation Between
Strings
n ?Amino Acid Sequences in Molecular Biology
n ?Data Compression Techniques
清华大学 宋斌恒 9
Longest Common subsequence
n Background:
n In bioinformatics, it needs often to compare the DNA
of two or more different organisms.
n A strand of DNA consists of a string of molecules
called base, where the possible bases are:
n Adenine:腺嘌呤
n Guanine:鸟嘌呤
n Cytosine:胞核嘧啶 , 氧氨嘧啶
n Thymine:胸腺嘧啶
n A DNA can be represented by a string over the finite
set [A,C,G,T]
清华大学 宋斌恒 10
Longest Common subsequence
n Two possible DNA string;
n S1 = ACCGGTCGAGTGCGCGGAAGCCGGC
n S2 = GTCGTTCGGAATCGGCTTGCCGTTG
n Our goal: Compare the two string to determine
how “similar”the two strands are.
清华大学 宋斌恒 11
What is the similarity of two strings?
n Issue: The similarity can be defined in many
ways, here we use the “longest common
subsequence”to describe the similarity
n Show your opinions![Many]
清华大学 宋斌恒 12
Concepts
n Subsequence and substring
n A substring of a string is a continuous part of
the string
n A subsequence of a string x is a string y with
its elements in y in the same order.
3
清华大学 宋斌恒 13
Longest Common Subsequence Problem
n Given two sequences
n X=<x1, x 2, x 3,… ,xm>
n Y=<y 1, y 2, y 3,… ,yn>
n Find the longest common subsequence, where
a common subsequence of two strings is
another string y such that it is a subsequence
of both of the two given strings.
清华大学 宋斌恒 14
Step1: Analyzing the problem
n A brute-force method: [Na?ve]
n How many subsequences for string x?
n Check each one is also a subsequence of
string y.
n Complexity?
清华大学 宋斌恒 15
Theorem 15.1 Optimal substructure of an
LCS
n Let X=<x1,x2,… ,xm> ,Y=<y1,y2,… ,yn> be two
sequences and let Z=<z1,z2,… ,zk> be a LCS
of X and Y.
n If xm=yn, then zk=xm=yn and Zk-1 is an LCS of
Xm-1 and Y n-1
n If xm !=yn, then zk!= xm implies that Zk is an
LCS of Xm-1 and Yn
n If xm != yn, then zk!= yn and Zk is an LCS of Xm
and Y n-1
清华大学 宋斌恒 16
Proof of Theorem 15.1
n Proof of “If xm=yn, then zk=xm=yn and Zk-1 is an
LCS of Xm-1 and Yn-1”
n If zk!=xmàwe could append xm to Z to obtain a
longer common subsequence of X and Y,
which is a contradiction, hence we have
zk=xm=yn and
n Zk-1 is a common subsequence of Xm-1 and Y n-1,
which we will show it is an LCS of Xm-1 and Y n-1.
If not, there are another common subsequence
which is longer than Zk-1 and then we append yn
to it, which is a contradiction.
清华大学 宋斌恒 17
Continuing proof of Theorem 15.1
n Proof of “If xm!=yn, then (zk != xm implies that
Zk is an LCS of Xm-1 and Yn)”
n If Zk is not an LCS of Xm-1 and Yn , then it will
conclude a contradiction that Z is an LCS of X
and Y.
清华大学 宋斌恒 18
Step 2: A recursive solution
n Let C[i,j] be the length of the LCS of Xi and Yj, then
??
???
≠??
=+??
=
=
ji
ji
yxjicjic
yxjic
ji
jic
]},1[],1,[max{
1]1,1[
0,0
],[
4
清华大学 宋斌恒 19
Step 3: Computing the length of an LCS
LCS-Length(X,Y)
① m=length[X]
② n=length[Y]
③ For i ?1 to m {c[i,0]=0} //初始化
④ For j ?0 to n {c[0,j]=0} //初始化
⑤ For i ?1 to m, j ? 1 to n
⑥ {
① if xi=yj then {c[i,j]=c[i-1,j-1]+1; continue}
② if c[i,j-1]>c[i-1,j] then {c[i,j]=c[i.j-1]; continue}
③ c[i,j]=c[i-1,j]
⑦ }
清华大学 宋斌恒 20
Step 4: Constructing the LCS
Modification for the last algorithm: store the information
of choices. Here after the b[i,j] stands for the choice
of computing c[i,j].
⑤ For i ?1 to m, j ?1 to n
⑥ {
① if xi=yjthen {c[i,j]=c[i-1,j-1]+1; b[i,j]=upperleft; continue}
② if c[i,j-1]>c[i-1,j] then {c[i,j]=c[i.j-1]; b[i,j]=upper;
continue}
③ c[i,j]=c[i-1,j]; b[i,j]=left
⑦ }
清华大学 宋斌恒 21
Print-LCS(b,X,i,j)
① If i=0 or j=0 then return
② If b[i,j]=upperleft then{
① Print-LCS(b,X,i-1,j-1)
② Print Xi;
③ Return;
③ }
④ If b[i,j]=upper Print-LCS (b,X,i -1,j); return;
⑤ Print-LCS(b,X,i,j);
⑥ return;
清华大学 宋斌恒 22
How to improve your code
n Save the space used in the last algorithm
n How to write the code neatly
n Complexity analysis
n Complete the LCS
n 备忘录法
清华大学 宋斌恒 23
如何分析问题
n 优美打印为例 :
n什么是优美打印问题 :
n英文打印每行空格的差别要小 : 一般是空格个
数的三次方的和达到极小 。 【 举例说明 】
清华大学 宋斌恒 24
子问题空间
n 原问题 ? 不优美度 ( 一段文字 )
n 考虑不考虑首行缩进问题 ?
n 不优美度 ( 空字符数 , 一段文字 )
n 一般子问题 : 从某位置开始的子文字段
n ? 分析子问题维数 ?
n ? 如何由子问题构造原问题 ?
n 特殊子问题 : 所有子文字段都从头排版
n ? 子问题问题维数 ?
n 是否可以由这类子问题构造原问题的解 ?
n 有没有更小的子问题空间 ?
5
清华大学 宋斌恒 25
Analyzing Printing neatly again
n 断句点
n 断句点确定后留下两类问题
n 一类问题的不优美度要考虑最后一行数值 ; 另
一类问题的不优美度不考虑最后一行的数值 ;
n 说明上述选择方法 、 证明上述选择方法的正确
性 。
n 递归公式准备工作如何做 ? 【 见下页 】
清华大学 宋斌恒 26
Recursive Formula
n 请问 : 如果 C(i,j)表示从第 i个词开始到第 j个词结束段落
包括 最后一行的不优美度 , 如果 D(i,j)表示从第 i个词开
始到第 j个词结束段落 不包括 最后一行的不优美度 , 假
设有 n个词 , 原问题的不优美度为
n D(1,n)! 递归公式 ?
n C(i,j )=mini<k<j{C(i,k)+ C(k+1,j)}
n D(i,j )=mini<k<j{C(i,k)+ D(k+1,j)}
n 初始值 ?
清华大学 宋斌恒 27
初始值 ( Initial values)
n 当 k(i,j)=j- i+ 所有词的字符数 ≤ 行字符数时 【 词与词
之间空格数为 1】
n C(i,j )=[行字符数 - k(i,j)]3
n D(i,j )=0
n 如果编程 : 请问最严重的异常情况是什么 ?
n 思考 : 如果字符宽度不同 , 上述方法有效否 ?
n 进一步 : 如果字符宽度不同 , 其宽度可以无级缩放 ,
每行两头完全对齐 , 如何定义优美度 ?
清华大学 宋斌恒 28
最优子结构
n 原问题的解是最优的
n 如何说明原问题的解是由子问题的最优解构成
的 , 这是分析问题的关键
清华大学 宋斌恒 29
优美打印问题变种
n 字符宽度不同
n 可以调整某些字符的宽度 , 如空格 , 但调整要
计算在优美度内 。
n 可以调整所有字符的宽度 。
n 其它语言如汉语的优美度问题
n 图文混排问题等等
清华大学 宋斌恒 30
编辑距离
n 从一个字符串到另一个字符串可以由若干变换构成 :
n Copy
n Replace
n Delete
n Insert
n Twiddle
n Kill
n x=σ(y), 其中 σ是一连串的变换 σ= t1t2… tn
6
清华大学 宋斌恒 31
变换成本与距离
n Cost(σ)=ΣCost(ti)
n 我们称由 x到 y的费用最小的 σ的费用就是 x和 y之间的距离 。
n 子问题空间如何确定 :
n C[xn,ym]=C[xn-2,ym-2]+twiddle 可以 twiddle,
n C[xn,ym]=C[xn-1,ym-1]+copy 可以 copy,
n C[xn,ym]=C[xn-1,ym-1]+replace 可以 replace,
n C[xn,ym]=C[xn-1,ym]+delete 可以 delete,
n C[xn,ym]=C[xn,ym-1]+insert 可以 insert,
n C[xn,ym]=C[xn-s,ym]+kill 可以 kill,
n 最优子结构 : 最优解可以分解为子问题的最优解的综合 。
n 问题 :
清华大学 宋斌恒 32
Optimal binary search tree
n Binary Search Tree?
n Binary tree
n Search tree
n Examples?
n Optimal: under some condition:
n The elements appear with different
frequencies
n How to formally define it?
清华大学 宋斌恒 33
A search tree
k1
k3
k4
k2
d0 d1
d2 d3
d4
清华大学 宋斌恒 34
Definition for the optimal binary
search tree
n Given a sequence K=<k1,k2,… kn> of n
distinct keys stored orderly from smaller value
to larger value.
n We need n+1 dummy keys for values not in K,
these dummy keys are d0,d1,… ,dn
n The appearance probabilities of ki and dj are
given
n Find a search tree such that:
清华大学 宋斌恒 35
Cost of the searches are minimum
n The expectation cost of search:[See p358
15.16]
n E[search cost in T]=
Sum[(depth(ki)+1)*pi]+
sum[(depth(di)+1)*qi]+
Where pi are the occurrence possibility of ki, qi
are the occurrence possibility of di
清华大学 宋斌恒 36
The structure of an optimal binary
search tree
n We need a structure that the optimal solution
can be constructed from one or many optimal
solution for subproblems
n Any sub tree of binary search tree contains
the keys continuing from ki to kj and also the
dummy keys from di-1to dj
7
清华大学 宋斌恒 37
费用递归函数
n e[i,j ], the optimal
expectation value for the
optimal tree
n If kr is the root of the
optimal subtree containing
the key ki to kj, we have
n e[i,j ]=pr+(e[i,r-1]+w(i,r-
1))+e[r+1,j]+w(r+1,j))
n e[i,j ]=e[i,r-1]+e[r+1,j]+w(i,j)
k1
k3
k4
k2
d0 d1
d2 d3
d4
清华大学 宋斌恒 38
The optimal structure of the binary
search tree
n If the tree is optimal, then its all subtree is
optimal.
n Prove it!
清华大学 宋斌恒 39
Chapter notes
n R. Bellman’s systematic study on dynamic
programming in 1955 with solid math base
n Matrix chain production has an O(n lg n)
solution by Hu and Shing
n Knuth posed a question is there any solution
whose complexity is less than O(mn) for LCS,
the answer is yes with a solution with O(mn/lg
n) by Masket and Paterson