Chapter 7 SEARCHING
1,Introduction,Notation
2,Sequential Search
3,Binary Search
4,Comparison Trees
5,Lower Bounds
6,Asymptotic
7,Pointers and Pitfalls
7.1 search:Introduction and
notation◆ We are given a list of records,where each
record is associated with one piece of
information,which we shall call a key.
◆ We are given one key,called the target,and
are asked to search the list to nd the record(s)
(if any) whose key is the same as the target.
◆ There may be more than one record with
the same key,or there may be no record with
a given key.
◆ We often ask how times one key is
compared with another during a search,This
gives us a good measure of the total amount of
work that the algorithm will do.
◆ Internal searching means that all the records
are kept in high-speed memory,In external
searching,most of the records are kept in disk
les,We study only internal searching.
◆ For now,we consider only contiguous lists
of records,We shall consider linked structures
in Chapter 10.
Records and Keys in
C++
The records (from a class Record) that are
stored in the list being searched (generally
called the list) must conform to the following
minimal standards:
◆ Every Record is associated to a key (of a
type or class called Key).
◆ Key objects can be compared with the
standard operators ==,!=,<,>,<=,>=,
◆ There is a conversion operation to turn any
Record into its associated Key.
The records (from a class Record) that are
stored in the list being searched (generally
called the list) must conform to the following
minimal standards:
◆ Every Record is associated to a key (of a
type or class called Key).
◆ Key objects can be compared with the
standard operators ==,!=,<,>,<=,>=,
◆ There is a conversion operation to turn any
Record into its associated Key.
◆ Record objects can be compared to each other
or to a Key by first converting the Record objects
associated keys.
◆ Examples of the conversion operation:
◆ A method of the class Record,with the
declaration
operator Key( ) const;
◆ A constructor for the class Key,with
declaration
Key(const Record &);
◆ If the classes Key and Record are identical,no
conversion needs to be defined,since any
Record is
automatically a Key.
◆ We do not assume that a Record has a Key
object as a data member,although often it does,
Parameters for Search Functions
Each searching function has two input
parameters:
◆ First is the list to be searched;
◆ Second is the target key for which we are
searching.
Each searching function will also have an
output parameter and a returned value:
◆ The returned value has type Error_code and
indicates whether or not the search is
successful in finding an entry with the target
key.
◆ If the search is successful,then the returned
value is success,and the output parameter
called position will locate the target within the
◆ If the search is unsuccessful,then the value
not present is returned,and the output
parameter may have an undefined value or a
value that will differ from one method to another.
Key Definition in C++
To select existing types as records and keys,a
client could use type definition statements such
as:
typedef int Key;
typedef int Record;
A client can design new classes that display
appropriate behavior based on the following
skeletons:
class Key
{ public:
static int comparisons;
Key(int x=0) { key = x; }
int the_key( ) const { return key; }
Key& operator=(const int x)
{ key = x; return *this; }
private:
int key;
char *other;
};
在本章我们使用的类声明如下,Definition of a Key class
typedef Key Record;
由于只关心查找 key,
不关心 Record中的
其它内容,因此将
Record定义为 key
static data number
7.2 Sequential Search
1,Algorithm and procedure
Begin at one end of the list and scan down it
until the desired key is found or the other end
is reached:
我们在第 6章中实现的 find方法:
int Find(const List_entry &x);
// 查找元素 x,找不到返回 -1,否则为 x在 表中的位置
就是顺序查找方法,与本节书上的方法区别仅仅是:
返回了类型是 Error_code,位置由引用参数带回。
Error_code sequential_search( List<Record> & the_list,
const int &target,int &position)
/* Post,If an entry in the list has key equal to target,then
return
success and the output parameter position locates
such an
entry within the list.
Otherwise return not_present and position
becomes
invalid,*/
{ for( position=0; position<the_list.size( ); position++)
{ Record x;
the_list.retrieve(position,x);
if(x==target) return success;
}
return not_present; // 脱离 for,则检索不成功
To analyze the behavior of an algorithm that
makes comparisons of keys,we shall use
the count of these key comparisons as our
measure of the work done.
2,Analysis
The number of comparisons of keys done in
sequential search of a list of length n is
◆ Unsuccessful search,n comparisons.
◆ Successful search,best case,1 comparison.
◆ Successful search,worst case,n
comparisons.
◆ Successful search,average case,
comparisons.
2
1n?
3,Testing Program
◆ For test purposes,use integer keys,and do
not store any data other than a key in a record,
Accordingly,we define typedef Key Record;
◆ Keep a count of all key comparison
operations,by modifying the overloaded key
comparison operations to increment a counter
whenever they are called.
◆ This counter must be available to all Key
objects,Thus,it should be declared as a static
class member,In C++,static class members
provide data objects that are shared by every
instance of the class.
◆ The method the key inspects a copy of a
key's value.
◆ The static counter comparisons is
incremented by any call to a Key comparison
operator,For example:
bool operator== (co st Key &x,const Key &y)
{ Key::comparisons++;
return x.the_key( ) == y.the_key( );
}
◆ Static data members must be defined and
initialized outside of a class definition,
Accordingly,the following statement is
included in the Key implementation file key.c:
int Key,,comparisons = 0;
◆ Use the class Timer from Appendix C to
provide CPU timing information.
Choice of test data
◆ Most later searching methods require the data
to be ordered,so use a list with integer keys in
increasing order.
◆ To test both successful and unsuccessful
searches,insert only keys containing odd integers
into the list.
◆ If n denotes the number of entries in the list,
then the targets for successful searches will be
1,3,5,…,2n – 1,
◆ For unsuccessful searches,the targets will be
0,2,4,6,…,2n.
◆ In this way we test all possible failures,
including targets less than the smallest key in the
list,between each pair,and greater than the
7.3 Binary Search
1,Ordered Lists
DEFINITION An ordered list is a list in which
each entry contains a key,such that the
keys are in order,That is,if entry i comes
before entry j in the list,then the key of
entry i is less than or equal to the key of
entry j,
◆ All List operations except insert and replace
apply without modification to an ordered list.
◆ Methods insert and replace must fail when
they would otherwise disturb the order of a list.
二分查找 /搜索
(折半查找 /搜索 )
◆ We implement an ordered list as a class
derived from a contiguous List,In this derived
class,we shall override the methods insert
and replace with new implementations.重载的插入方法
typedef Key Record;
class Ordered_list,public List<Record>
{ public:
Ordered_list( );
Error_code insert(const Record &data);
Error_code insert(int position,const Record
&data);
Error_code replace(int position,const Record
&data);
};
该查找方法的
应用条件
覆盖父类的插入方法
Error_code Ordered_list::insert(const Record &data)
{ // 按要插入的 data.key值插入到合适的位置,以保持有序
for(int i=0; i<size() && data >= entry[i]; i++);
return List<Record>::insert(i,data);
}
带两个参数的重载父类的插入方法,请看 Pg.280
Note,The overridden method replaces a
method of the base class by one with a
matching name and parameter list,The
overloaded method matches in name but has
a different parameter list.
2,Algorithm Development
Idea,In searching an ordered list,first compare
the target to the key in the center of the list,If it is
smaller,restrict the search to the left half;
otherwise restrict the search to the right half,and
repeat,In this way,at each step we reduce the
length of the list to be searched by half.
Keep,two indices,top and bottom,that will
bracket the part of the list still to be searched.
The target key,provided it is present,will be
found between the indices bottom and top,
inclusive.
Initialization,Set bottom = 0; top = the list.size( ) -
1;
Compare target with the Record at the midpoint,
2
top )(b ottommid ??
Change the appropriate index top or bottom to
restrict the search to the appropriate half of the
list.
Loop terminates when top ≤ bottom,if it has not
terminated earlier by finding the target.
Make progress toward termination by ensuring
that the number of items remaining to be
searched,
top - bottom + 1,strictly decreases at each
Please see:
1,recursive Version Pg.282
2,Non-recursive Version Pg.283 -
Pg.285
下面我们先给出有序表类的两个方法的实现算
法,然后给出顺序查找与非递归二分查找算法
(Pg.285 binary_search_2)及测试上述算法的
主程序:
Error_code Ordered_list::insert(const Record &data)
{ // 按要插入的 data.key值插入到合适的位置,以保持有序
for(int i=0; i<size( ) && data >= entry[i]; i++);
return List<Record>::insert(i,data);
}
Error_code Ordered_list::replace(int position,const
Record &data)
// overridden replace
{ // 为了保持有序,先删除序号 position的记录,然后插入
Record list_data;
if((remove(position,list_data))!=success) return fail;
return insert(data);
}
Error_code sequential_search(Ordered_list& the_list,
const int &target,int &position) // 顺序查找算法
{ for( position=0; position<the_list.size( ); position++)
{ Record x;
the_list.retrieve(position,x);
if(x==target) return success;
}
return not_present; // 脱离 for,则 检索不成功
}
Error_code binary_search(Ordered_list& the_list,const
int &target,int &position) // 二分查找算法
{ int bottom=0,top=the_list.size()-1; // 置区间初值
while(bottom<=top)
{ int mid=(bottom+top)/2;
Record x; the_list.retrieve(mid,x);
if ( target= =x ) // 检索成功
{ position = mid; return success; }
if (target>x) bottom=mid+1; // 修改下界 bottom

else top=mid-1; // 修改上界 top值
}
return not_present; // 脱离 while,则 bottom>top 检索不成

}
See Pg.285
binary_search_2
#include <stdlib.h>
#include <time.h>
#include "Search.h"
void main()
{ Ordered_list L; int n,i,j,pos; Record x; char answer;
time_t t; srand((unsigned)time(&t)); //初始化随机数种子
cout<<" Please enter List Size=?(<=30) "; cin>>n;
cout<<" Generate Random sequence is,\n{ ";
for(i=0; i<n; ) // 产生 n个互不相等的整数,插入表,使有序
{ j = rand() % 1000;
if (binary_search(L,j,pos)==not_present)
// 产生的整数不在表中,插入
{ x=j; L.insert(x); i++; cout<<j<<' '; }
}
cout<<"}\n Ordered_list L is:\n { "; L.traverse(visit);
cout<<"}\n";
do { cout<<"\n Please enter search key, "; cin>>i;
Key::comparisons = 0;
if(binary_search(L,i,pos)==success) // 用二分查找
cout<<"\n Recorg Position ="<<pos<<"\t
binary_search
comparisons="<<Key::comparisons;
else cout<<"\n Recorg not present"<<"\t
binary_search
comparisons="<<Key::comparisons;
if(sequential_search(L,i,pos)==success) // 用顺序查找
cout<<"\n Recorg Position ="<<pos<<"\t
sequential_search comparisons="<<pos+1;
else cout<<"\n Recorg not present"<<"\t
sequential_search
comparisons="<<pos;
do{ cout<<"\n Please enter replaced position key, ";
cin>>i>>j; x=j;
if(L.replace(i,x)==success)
{ cout<<" Ordered_list L is:\n { ";
L.traverse(visit); cout<<"}\n"; }
else cout<<" fail,position error!\n";
cout<<"\n\n Continue replace (y/n)? "; cin>>answer;
}while(answer=='y' || answer=='Y');
cout<<endl;
}
上面有两个 do循环,第 1个:每次输入一个查找关键码,
分别用两种方法查找,输出查找结果及比较次数。
第 2个每次输入一对值,分别是欲修改记录的位置及要
重新设置的记录值。
Algorithm Verification
◆ The division of the list into sublists is
described in the following diagram:
◆ Only entries strictly less than target appear in
the first part of the list,but the last part contains
entries greater than or equal to target.
◆ When the middle part of the list is reduced to
size 1,it will be guaranteed to be the first
occurrence of the target if it appears more than
once in the list.
◆ We must prove carefully that the search
makes progress towards termination,This
requires checking the calculations with indices
to make sure that the size of the remaining
sublist strictly decreases at each iteration,It is
also necessary to check that the comparison of
keys corresponds to the division
into sublists in the above diagram.Recognizing Equality in Binary Search
◆ Method,Check at each stage to see if the
target has been found.
Example,Search process of 15 and
60
Search 15 && 60 step 1
Search 15 step 2 success
Search 60 step
2
Search 60 step
3
Search 60 step 4 not_present
7.4 Comparison Trees
The comparison tree (also called decision
tree or search tree) of an algorithm is
obtained by tracing the action of the
algorithm,representing each comparison of
keys by a vertex of the tree (which we draw
as a circle),Inside the circle we put the
index of the key against which we are
comparing the target key.由于本书这一部分内容比较晦涩,我们先用
中文教材中的方式讲解,之后再解释本书内容。
n=10的判定树
◆ The comparison ( decision/search ) tree of
Example
中文书通常称为:折半查找判定树 查找成功,关键字x == List [7].key查找失败
查找关键字 x 介于
List [1].key和
List [2].key之间
在判定树中所有叶结点均为方形结点, 称为判
定树的 外部结点 (称圆形结点为 内部结点 ),那么
查找不成功的过程就是走了一条从根结点到外部
结点的路径, 和关键字进行比较的次数等于该路
径的内部结点个数, 例如查找 60的过程是走了
从根到结点 8-9的路径 。
n个 内部 结点构成的 判定树, 其高度不会大于
?log2n?, 因此, 二分查找在 查找不成功 时和
关键字比较的次数最多不超过 ?log2n? +1( 外
部结点 ) 。
external vertices internal vertices
平均查找长度,为了确定数据元素在查找表中
的位置,需要和 给定值 进行比较的关键字个数
的数学期望值,称为查找算法在 查找成功 时的
平均查找长度 。对于长度为 n的查找表,查找成
功的平均查找长度为:
?
?
????? n
1i iinn2211
cpcpcpcpA S L ?
其中 pi为查找第 i个数据元素的概率,ci为
找到查找表中第 i个元素时,进行比较的次数。
Average Search Length
那么, 二分查找的平均查找长度是多少呢?
假定表的长度为 n=2h-1,则描述二分查找
的判定树是深度 (高度 )为 h的满二叉树 。 树
中层次为 1的结点有 1个, 层次为 2的结点有
2个, …, 层次为 h的结点有 2h-1个 。 又假设
表中每个元素的查找概率相等 ( Pi=1/n),
则二分查找的平均查找长度为,height
level
◆ Branches (lines) drawn down from the circle
represent the possible outcomes of the
comparison,When the algorithm terminates,we
put either F (for failure) or the location where the
target is found at the end of the appropriate
branch,which we call a leaf,and draw as a square.
◆ The remaining vertices are called the internal
vertices of the tree,The number of comparisons
done by an algorithm in a particular search is the
number of internal vertices traversed in going
from the top of the tree,called its root,down the
appropriate path to a leaf.
◆ The number of branches traversed to reach a
vertex from the root is called the level of the vertex,
root itself has level 0,the vertices immediately
below it have level 1,and so on,The largest level
that occurs is called the height of the tree.
◆ We call the vertices immediately below a vertex
v the children of v and the vertex immediately
above v the parent of v.
◆ The external path length of a tree is the sum of
the number of branches traversed in going from
the root once to every leaf in the tree.
◆ The internal path length is defined to be the sum,
over all vertices that are not leaves,of the number
of branches from the root to the vertex.
◆ As 2-tree is a tree in which every vertex
except the leaves has exactly two children.
严格二叉树
Lemma 7.1 The number of vertices on each level of
a 2-
tree is at most twice the number on the level
immediately above,Hence,in a 2-tree,the
number
of vertices on level t is at most 2t for t ≥ 0.Lemma 7.2 If a 2-tr e has k vertices on level t,
then
t ≥ lg k,where lg denotes a logarithm with
base 2.
Conventions
Unless stated otherwise,all logarithms will
be taken with base 2.
The symbol lg denotes a logarithm with
base 2,and the symbol ln denotes a natural
logarithm.
When the base for logarithms is not
specified (or is not important),then the
symbol log will be used.
约定
The floor of a real number x is the largest
integer less than or equal to x,and the ceiling of
x is the smallest integer greater than or equal to
x,We denote the floor of x by and the ceiling
of x by,
??x ??x
Binary Search Analysis
The number of comparisons done in an
unsuccessful search by binary_search_2 is
approximately 2 lg(n+ 1).
In a successful search of a list of n entries,
binary search does approximately
comparisons of keys.注意, 本书与中文教材在这儿相差一个常数因子 2。
本书是精确的用和“关键字比较 次数,来计算的,

中文教材用,进行了比较的关键字的 个数,来计算
的。
(每进行一个关键字比较,若查找成功则比了一次,否则还要
比一次,以确定待查字到底大于当前关键字否,并依此选择
继续在前半段查找还是后半段查找。 )
7.5 Lower Bounds
See Pg,297-301
详细分析了二分查找的算法复杂度,结论:从理论上
说 binary_search_ 1(Pg.281 )的上界最小 。
以 10个结点的二分查找树 ( Pg.287-288 Fig.7.2-
Fig.7.4 )为例,介绍分析算法复杂度 的方法 。
◆ 算法的复杂性是算法运行所需要的计算机资源的量,
需要的时间资源的量称作 时间复杂性, 需要的空间 ( 即
存储器 ) 资源的量称作 空间复杂性 。 这个量应该集中反
映算法中所采用的方法的效率, 而从运行该算法的实际
计算机中抽象出来 。 换句话说, 这个量应该是只依赖于
算法要解的问题的规模, 算法的输入和算法本身的函数 。
如果分别用 N,I和 A来表示算法要解问题的规模, 算法
的输入和算法本身, 用 C表示算法的复杂性, 那么应该
有,C =F(N,I,A) 其中 F(N,I,A)是 N,I,A的一个确定
的三元函数 。
7.6 Asymptotics,
The Big-O and Other Notations
◆ 如果把时间复杂性和空间复杂性分开,并分别用
T 和 S来表示,那么应该有:
T = T( N,I,A ) 和 S = S( N,I,A )
◆ 通常,我们让 A隐含在复杂性函数名当中,因而将
T 和 S 分别简写为:
T = T( N,I ) 和 S = S( N,I )
◆ 由于时间复杂性和空间复杂性概念类同, 计算方
法相似, 且空间复杂性分析相对地简单些, 所以我
们将主要地讨论时间复杂性 。
◆ 根据 T(N,I )的概念,它应该是算法在一台抽象的计
算机上运行所需的时间。设此抽象的计算机所提供的
元运算有 k 种,他们分别记为 O1,O2,…,Ok;再设这
些元运算每执行一次所需要的时间分别为 t1,t2,…,tk 。
对于给定的算法 A,设经过统计,用到元运算 Oi的次
数为 ei,i=1,2,..,k,很明显,对于每一个 i,1≦ i≦ k,
ei是 N 和 I 的函数,即 ei=ei(N,I)。则:
其中 ti,i=1,2,..,k,是与 N,I无关的常数。 I)( N,etI)T ( N,i
k
1i i???
显然,我们不可能对规模 N的每一种合法的输入 I都去
统计 ei(N,I),i=1,2,…,k。因此 T(N,I)的表达式还得
进一步简化,或者说,我们只能在规模为 N的某些或某
类有代表性的合法输入中统计相应的 ei,i=1,2,…,k,
评价时间复杂性。
下面我们只考虑三种情况的复杂性, 即最坏情况, 最
好情况和平均情况下的时间复杂性, 并分别记为
Tmax(N),Tmin(N)和 Tavg(N )。 在数学上有:
I * )T ( N,
k
1i
I * )( N,ieit
k
1i
I)( N,ieit
NDI
m a xI)T ( N,
NDI
m a x( N )m a xT ??
?
??
??
?
?
?
)IT ( N,
k
1i
)I( N,ieit
k
1i
I)( N,ieit
NDI
m i nI)T ( N,
NDI
m i n( N )m i nT ~~ ??
?
??
??
?
?
?
?
?
?
?
??
?
?
NDI
k
1i
I)( N,ieitI)T ( N,P ( I )
NDI
m a x( N )a vgT
其中, DN是规模为 N的合法输入的集合; I*是 DN中一个使 T(N,I *)
达到 Tmax(N)的合法输入, 是 DN中一个使 T(N,) 达到 Tmin(N)的合
法输入;而 P(I)是在算法的应用中出现输入 I的概率 。
I~
复杂性的渐近性态及其阶
设 T(N)是在前面所定义的关于算法 A的复杂性函数 。
一般说来, 当 N单调增加且趋于 ∞时,T(N)也将单调增并
趋于 ∞。 对于 T(N),如果存在,使得当 N→∞ 时有:(N)T
0(N ))/ T (N )T-(T (N ) ?那么, 我们就说 (N)T 是 T(N) 当
N→∞ 时的 渐近性态, 或叫 为算法 A当 N→∞ 时的
渐近复杂性 而与 T(N)相区别, 因为在数学上, 是 T(N)
当 N→≦ 时的渐近 (asymptotic)表达式 。
(N)T
(N)T
直观上, 是 T(N)中略去低阶项所留下的主项 。 所以
它无疑比 T(N)来得简单 。
(N)T
当 T(N)=3N2 + 4Nlog2N + 7 时, = 3N2,因为这时
有:
例:
(N)T
????? ?? n 当073 72 n
2
n
2
4 N l o g
4 N l o g( N ) ) / T ( N )T-( T ( N )
N
显然, 3N2 比 3N2 + 4Nlog2N + 7 简单得多 。
渐近意义下的记号 ( asymptotic notation),
The Big-O and Related Notations
◆ 大写 Ο符号
定义 设 f(N)和 g(N)是定义在正数集上的正函数。如果存在正的
常数 C和自然数 N0,使得当 N ≥N0时有 f(N)≤Cg(N)。则称函数
f(N)当 N充分大时上有界,且 g(N)是它的一个 上界,记为
f(N)=Ο(g(N))。这时我们还说 f(N)的阶不高于 g(N)的阶 。

◆ 小写 o 符号
定义 设 f(N)和 g(N)是定义在正数集上的正函数。如果对于任意
给定的 ε> 0,都存在正的常数 C和自然数 N0,使得当 N ≥N0时有
f(N)/g(N)<ε。则称函数 f(N)当 N充分大时 的阶比 g(N) 低,
记为 f(N) = o((g(N))。
>
◆ ? 符号 (与大 Ο符号类似,它用来估算函数 f的下限值。 )
定义 设 f(N)和 g(N)是定义在正数集上的正函数。如果存在正的
常数 C和自然数 N0,使得当 N ≥N0时有 f(N) ≥ Cg(N)。则称函数
f(N)当 N充分大时下有界,且 g(N)是它的一个下 界,记为 f(N)=
?(g(N))。这时我们还说 f(N)的阶不低于 g(N)的阶 。

◆ θ 符号 (同一函数 g既是 f 的上限也是 f的下限 。 )
定义 f(N)= θ(g(N))当且仅当 f(N)=Ο(g(N))且 f(N) = ?(g(N))。

一般,在算法分析中常常只用 Ο符号,即只关心算法 A
的 时间复杂性 f(N)的上界 (最坏情况 ) 。下面我们就举例
说明 Ο符号 的使用。
举例,(略去低阶项和常数因子,留下主项, 记为 Ο)
⑴ 因为对所有 N≥ 1有 3N ≤ 4N,我们说 3N=Ο (N);
⑵ 因为当 N ≥ 1时有 N +1024≤ 1025N,我们说 N +1024=Ο (N);
⑶ 因为当 N≥ 10时有 2N2+11N-10≤ 3N2,我们说
2N2+11N-10 =Ο (N 2);
⑷ 反例 N 3≠Ο(N 2)。
因为若不然, 则存在正的常数 C和自然数 N0,使得当 N≥N0时
有 N3≤CN 2,即 N≤C。 显然当取 N=max(N0,[C]+l)时这个不等式
不成立, 所以 N3≠Ο(N 2)。
See Pg,310
Pointers and Pitfalls
◆ In designing algorithms be very careful of the
extreme cases,such as empty lists,lists with only
one item,or full lists (in the contiguous case).
◆ Be sure that all your variables are properly
initialized.
◆ Double check the termination conditions for your
loops,and make sure that progress toward
termination always occurs.
仔细检查
◆ In case of difficulty,formulate statements that
will be correct both before and after each iteration
of a loop,and verify that they hold.
◆ Avoid sophistication for sophistication's sake,
Whenever a simple method is adequate for your
application,use it.
◆ Don't reinvent the wheel,If a ready-made
function is adequate for your application,use it.
◆ Sequential search is slow but robust,Use it for
short lists or if there is any doubt that the keys in
the list are properly ordered.
◆ Be extremely careful if you must reprogram
binary search,Verify that your algorithm is
correct and test it on all the extreme cases.
◆ Drawing trees is an excellent way both to trace
the action of an algorithm and to analyze its
behavior.
◆ Rely on the big-O analysis of algorithms for
large applications but not for small applications.