?Definition of search
Static search table
Dynamic search table
Hash table
CHAPTER 6
Search
Search table is collection constituted By the same types
of data elements (or record),because the elements in the
collection has loose relationship to each other,so search
table is a facilitating application data structure.
Basic operation of search table:
Search a specific data element in the search table ;
Search the properties of a specific data element;
Insert a data element into the search table;
Delete a data element in the search table
§ 1 Definition of search
Static search table
only for querying and searching。
Dynamic search table
In the process of searching in the search
table,insert the inexistent data elements or
remove a pre-existing data elements at the same
time,Such table is called dynamic search table.
The classification of search table:
Keyword
it is a value of property of a data element
(or record) to identify the data element( or
record),If keyword can be identified only as a
record,said that the,primary keyword." If
keyword can identify a number of records,
said that the,general keyword."
According to a giving value,search the table to return a
data element whose keyword equal to the value.
If search table has this data element(or record)
then call,search successful”,and return,the
entire record or the position of it in the table;
else call,search failed”,and return,null
record or null pointer.
Searching
How to search?
Searching method depends on the structure of
the search table.
As the data elements in the search table does
not exist obvious relation,it is not easy to find.
In order to improve the efficiency of finding,it
is need to attach artificial relationship among
records,in other words,to use a different
structure to implement search table.
evaluate searching method
Search speed
How much storage space occupied
Complexity of the algorithm
ASL(Average Search Length),To ascertain the
position of record in the table,and the expect number
of keyword comparing with the giving value
1
1
1
n
i
i
n
i
ii
p
cpA S LTo the table with n records
is the probability of the ith record in the table
is the number of keyword comparingiicp
ADT definition of static search table:
ADT StaticSearchTable {
It is a collection composed by the same
characteristics data elements,Each type of data
elements contain the same type keyword,marking
the only data elements,
Data relation R,data elements belongs to a
collection
Data object D:
§ 2 static search table
Create(&ST,n); //构造一个含 n 个数据元素的静态查找表 ST。
Destroy(&ST); //销毁表 ST。
Search(ST,key); //查找 ST 中其关键字等于 kval 的数据元素。
Traverse(ST,Visit()); //按某种次序对
ST的每个元素调用函数
Visit()一次且仅一次,
} ADT StaticSearchTable
Basic operation P:
Sequential lists’ search
typedef struct {
ElemType *elem; // 数据元素存储空间基址,建表时按实际长度分配,0号单元留空
int length; // 表的长度
} SSTable;
Use sequential lists as a static search table,and
the function of Search can be implemented by
sequential search method,
As follows is the sequential storage structure.
i
0 1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
找 64
64
监视哨
i i i i
比较次数 =5比较次数:
查找第 n个元素,1
查找第 n-1个元素,2
……….
查找第 1个元素,n
查找第 i个元素,n+1-i
查找失败,n+1
Process of search,compare the keyword
of record with the giving value one by one
from the beginning of table.
For example:
int Search_Seq(SSTable ST,
KeyType kval) {
// 在顺序表 ST中顺序查找其关键字等于
// key的数据元素。若找到,则函数值为
// 该元素在表中的位置,否则为 0。
ST.elem[0].key = kval; // 设置“哨兵”
for (i=ST.length; ST.elem[i].key!=kval; --i);
// 从后往前找
return i; // 找不到时,i为 0
} // Search_Seq
Description of the algorithm:
Implementation in C
int Search_Seq(SSTable ST,KeyType key)
{ // 在顺序表 ST中顺序查找其关键字等于 key
的数据元素。若找到,则函数值为
// 该元素在表中的位置,否则为 0。算法 9.1
int i;
ST.elem[0].key=key; // 哨兵
for(i=ST.length;!EQ(ST.elem[i].key,key);--i);
// 从后往前找
return i; // 找不到时,i为 0
}
performance analysis of the sequential search
Average Search Length:
To ascertain the position of record in the table,
and the expect number of keyword comparing
with the giving value
n
i
ii CPA S L
1n,table length
Pi is the probability of the ith record in the table
Ci is the comparing number of keyword
1
1

n
i
iP
the ASL of the sequential search is:
In terms of sequential lists,Ci = n-i+1
2
111
1

n)i(n
n
A S L
n
i
ss
ASL = nP1 +(n-1)P2 + +2Pn-1+Pn
n
P i
1
In the case of search at the same probability,
In the case of search at different probability,
ASLss takes minimum value at this case:
Pn≥ Pn-1≥ ···≥ P2≥ P1
According to records’ search probability in the
table,re-arrange them from small to big to
improve efficiency.
If unable to determine the search probability
in advance,then the method of improving
search process is that,after each search,just
to put the records searched to the end of the
table position.
The search algorithm of sequential lists is easy,but
ASL is too large,it isn’t fit for long sequential lists.
If the sequential lists is in order,then the process of
search can be based on,binary”.
Search of ordered sequential lists
Binary search
Search process,reduce the range of pre-search
record to the half.
Application condition,the storage structure is
ordered sequential lists,
1.Assume that table length is n,low,high and
mid denote upper bound,lower bound and middle
point of the pre-search record?s range,and k is the
giving value.
2.at the beginning,set,
low=1,high=n,mid=?(low+high)/2?
compare k with the record pointed by mid
if k==r[mid].key,search success
else if k<r[mid].key,then high=mid-1
else if k>r[mid].key,then low=mid+1
3,Repeat the above-mentioned operation until low>high,
search failure
Implementation of binary search algorithm
low highmid
例 1 2 3 4 5 6 7 8 9 10 115 13 19 21 37 56 64 75 80 88 92
找 21
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
low highmid
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
low highmid
Such as,key =21 process of search
low Instruct the lower bound of the search range
high Instruct the upper bound of the search range
mid = (low+high)/2。
key =70,Process of search
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
low highmid
找 70
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
low highmid
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
low high
mid
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
low highmid
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
lowhigh
1 2 3 4 5 6 7 8 9 10 11
5 13 19 21 37 56 64 75 80 88 92
When lower bound ‘low’ is bigger than
upper bound ‘high’,it means that any
keyword isn’t equal to the ‘key’,search
unsuccessful。
int Search_Bin ( SSTable ST,KeyType kval ) {
low = 1; high = ST.length; // 置区间初值
while (low <= high) {
mid = (low + high) / 2;
if ( kval == ST.elem[mid].key )
return mid; // 找到待查元素
else if ( kval < ST.elem[mid].key) )
high = mid - 1; // 继续在前半区间进行查找
else low = mid + 1; // 继续在后半区间进行查找
}
return 0; // 顺序表中不存在待查元素
} // Search_Bin
Binary Search
Description with C language
int Search_Bin(SSTable ST,KeyType key)
{ // 在有序表 ST中折半查找其关键字等于 key的数据元素。若找到,则函数值为
// 该元素在表中的位置,否则为 0。算法 9.2
int low,high,mid;
low=1; // 置区间初值
high=ST.length;
while(low<=high)
{
mid=(low+high)/2;
if EQ(key,ST.elem[mid].key) // 找到待查元素
return mid;
else if LT(key,ST.elem[mid].key)
high=mid-1; // 继续在前半区间进行查找
else
low=mid+1; // 继续在后半区间进行查找
}
return 0; // 顺序表中不存在待查元素
}
Example of a table with 11 elements,n=11
6
3 9
1 4
2 5
7
8
10
11
i 1 2 3 4 5 6 7 8 9 10 11
Ci 12 23 3 3 34 4 4 4
Performance analysis of Binary Search
Decision tree,Binary tree describing the search
process
The depth of decision tree with n nodes
is:?log2n?+1
Comparison in binary search is less than?log2n?+1
Supposing that,length of sorted table is n=2h-
1(correspondingly h=log2(n+1) ),than the decision
tree describing binary search is full binary tree
with depth of h,In the tree,there's 1 node with 1
level,2 nodes with 2 level,h nodes with 2 h-1
level.On the assumption that every record share
the equal chance to be visited,Than the average
searching length of binary search is:
1)1(l o g1211 2
1
1
1


n
n
nj
n
C
n
A S L
h
j
j
n
i
ibs
When n>50,the approximately result is:
1)1(l o g 2 nA S L bs
So we know,
Binary search is more efficient than sequence
search.
Binary search is used only for sorted table,at
the same time stored sequent,
Sequence table Sorted table
characters unsorted sorted
Storage
pattern
Sequent OR
chains
sequent
Insertion
and delete
Easy to do Element
moved
ASL bigger smaller
Compression between sequence
table and sorted table
Indexed sequent table
Set an index term when creating sequent table,
which include,keyword and point,Index table is
sorted by keywords,and the table is sorted in each
block.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 ……
17 08 21 19 14 31 33 22 25 40 52 61 78 46 ……
21 0 40 5 78 10 ……
Indexed sequent table=Index
+sequent table
Sequent table
Index table
Search in indexed sequent table
also named blocking search
Process,将 break the table into several blocks,
It?s unsorted within the block,but sorted
between blocks; First certify the block holding
the aim data,than look for it in the block.
Used for,sorted blocked table
Algorithm:
Store the target record in arrays,and make
sure every element has keyword field
Create index table,each node contains the
max keyword field and the point directing to
the first node of the block
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
22 12 13 8 9 20 33 42 44 38 24 48 60 58 74 57 86 53
22 48 86
1 7 13
Index table
Looking for 38
For example:
Analysis of blocking search
2
)1(l o g)2(
1)(
2
1
2
1
2
1
11
)1(
2
11
s
s
n
A S L
s
s
nsb
i
s
j
b
A S L
sbn
L
L
LLA S L
bs
s
i
b
j
bs
w
b
wbbs






:用折半查找确定所在块
:用顺序查找确定所在块率相等,则:表中每个记录的查找概个记录,并设块,每块含的表平均分成若将表长为均查找长度—在块中查找元素的平—
块的平均查找长度—查找索引表确定所在—其中:
Comparison
ASL 最大 最小 两者之间表结构 有序表、无序表 有序表 分块有序表存储结构 顺序存储结构线性链表顺序存储结构 顺序存储结构线性链表顺序查找 折半查找 分块查找
(n)
(1)
(n)
(1)
(nlogn)
Characters of these tables
search insertion delete
Unsorted sequent table
Unsorted linear chains
Sorted sequent table
Sorted linear chains
Static searching tree
(n)
(n)
(logn)
(n)
(logn)
(n)
(1)
(n)
(1)
(nlogn)
Focused on the performance,the best
situation is?(logn),which required
sorted table;
Focused on the insertion/delete
operations,the best situation is?(1),
which required the table is chained.
Conclusions:
ADT DynamicSearchTable {
Definition of dynamic search table with abstract data
type,
Data object D:
Data relation R:
D is the set of data elements with same characters.
Each element contains a keyword to certify a certain data
uniquely.
Characters of dynamic search table,Table's structure
is settled in searching process,If there’s keyword
which equals to the given keyword,that indicates
operation is successful; or insert the given keyword.
§ 3 Dynamic search table
Data elements belong to same set
InitDSTable(&DT)//构造一个空的动态查找表 DT。
DestroyDSTable(&DT)//销毁动态查找表 DT。
SearchDSTable(DT,key);//查找 DT 中与关键字
key等值的元素。
InsertDSTable(&DT,e);//若 DT 中不存在其关键字等于 e.key 的 数据元素,则插入 e 到 DT。
DeleteDSTable(&T,key);//删除 DT中关键字等于
key的数据元素。
TraverseDSTable(DT,Visit());//按某种次序对
DT的每个结点调用函数 Visit() 一次且至多一次。
Basic operation:
next
}ADT DynamicSearchTable
The Search Tree ADT -- Binary Search Trees
【 Definition】 A binary search tree is a binary tree,It may be
empty,If it is not empty,it satisfies the following properties:
(1) Every node has a key which is an integer,and the keys are
distinct.
(2) The keys in a nonempty left subtree must be smaller than the
key in the root of the subtree.
(3) The keys in a nonempty right subtree must be larger than the
key in the root of the subtree.
(4) The left and right subtrees are also binary search trees.
30
5
2
40
20
15
12
25
10 22
60
70
8065
1,Definition
2,ADT
Objects,A finite ordered list with zero or more elements.
Operations:
SearchTree MakeEmpty( SearchTree T );
Position Find( ElementType X,SearchTree T );
Position FindMin( SearchTree T );
Position FindMax( SearchTree T );
SearchTree Insert( ElementType X,SearchTree T );
SearchTree Delete( ElementType X,SearchTree T );
ElementType Retrieve( Position P );
3,Implementations
Find
Position Find( ElementType X,SearchTree T )
{
if ( T == NULL )
return NULL; /* not found in an empty tree */
if ( X < T->Element ) /* if smaller than root */
return Find( X,T->Left ); /* search left subtree */
else
if ( X > T->Element ) /* if larger than root */
return Find( X,T->Right ); /* search right subtree */
else /* if X == root */
return T; /* found */
}
T( N ) = S ( N ) = O( d ) where d is the depth of X
Position Iter_Find( ElementType X,SearchTree T )
{
/* iterative version of Find */
while ( T ) {
if ( X == T->Element )
return T ; /* found */
if ( X < T->Element )
T = T->Left ; /*move down along left path */
else
T = T-> Right ; /* move down along right path */
} /* end while-loop */
return NULL ; /* not found */
}
FindMin
Position FindMin( SearchTree T )
{
if ( T == NULL )
return NULL; /* not found in an empty tree */
else
if ( T->Left == NULL ) return T; /* found left most */
else return FindMin( T->Left ); /* keep moving to left */
}
FindMax
Position FindMax( SearchTree T )
{
if ( T != NULL )
while ( T->Right != NULL )
T = T->Right; /* keep moving to find right most */
return T; /* return NULL or the right most */
}
T( N ) = O ( d )
T( N ) = O ( d )
Insert
30
5
2
40
Sketch of the idea:
Insert 80
check if 80 is already in the tree
80 > 40,so it must be the right child of
4080
Insert 35? check if 35 is already in the tree
35 < 40,so it must be the left child of 40
35
Insert 25? check if 25 is already in the tree
25 > 5,so it must be the right child of 5
25
SearchTree Insert( ElementType X,SearchTree T )
{
if ( T == NULL ) { /* Create and return a one-node tree */
T = malloc( sizeof( struct TreeNode ) );
if ( T == NULL )
FatalError( "Out of space!!!" );
else {
T->Element = X;
T->Left = T->Right = NULL; }
} /* End creating a one-node tree */
else /* If there is a tree */
if ( X < T->Element )
T->Left = Insert( X,T->Left );
else
if ( X > T->Element )
T->Right = Insert( X,T->Right );
/* Else X is in the tree already; we'll do nothing */
return T; /* Do not forget this line!! */
}
T( N ) = O ( d )
Delete
Delete a leaf node,Reset its parent link to NULL.
Delete a degree 1 node,Replace the node by its single child.
Delete a degree 2 node,
Replace the node by the largest one in its left subtree or the
smallest one in its right subtree.
Delete the replacing node from the subtree.
〖 Example〗 Delete 60 40
50
45 55
52
60
70
20
10 30
Solution 1,reset left subtree,55
52
Solution 2,reset right subtree.
SearchTree Delete( ElementType X,SearchTree T )
{ Position TmpCell;
if ( T == NULL ) Error( "Element not found" );
else if ( X < T->Element ) /* Go left */
T->Left = Delete( X,T->Left );
else if ( X > T->Element ) /* Go right */
T->Right = Delete( X,T->Right );
else /* Found element to be deleted */
if ( T->Left && T->Right ) { /* Two children */
/* Replace with smallest in right subtree */
TmpCell = FindMin( T->Right );
T->Element = TmpCell->Element;
T->Right = Delete( T->Element,T->Right ); } /* End if */
else { /* One or zero child */
TmpCell = T;
if ( T->Left == NULL ) /* Also handles 0 child */
T = T->Right;
else if ( T->Right == NULL ) T = T->Left;
free( TmpCell ); } /* End else 1 or 0 child */
return T;
}
T( N ) = O ( h ) where h is the height of the tree
Note:
If there are not many deletions,then lazy deletion may
be employed,add a flag field to each node,to mark if a
node is active or is deleted,Therefore we can delete a node
without actually freeing the space of that node,If a deleted
key is reinserted,we won’t have to call malloc again.
While the number of deleted nodes
is the same as the number of active nodes
in the tree,will it seriously affect
the efficiency of the operations?
4,Average-Case Analysis
Question,Place n elements in a binary search tree,How high
can this tree be?
Answer,The height depends on the order of insertion.
〖 Example〗 Given elements 1,2,3,4,5,6,7,Insert them
into a binary search tree in the orders:
4,2,1,3,6,5,7 and 1,2,3,4,5,6,7
4
5
6
7
2
1 3
h = 2
h = 6
§ 8 AVL Trees
Target,Speed up searching (with insertion and
deletion)
Tool,Binary search trees root
small large
Problem,Although Tp = O( height ),but the height
can be as bad as O( N ).
〖 Example〗 binary search trees obtained for the months of the
year
Nov
Oct
Sept
May
Mar
June
July
Dec
Aug
Apr
Feb
Jan
July
June
Mar
May
Oct
SeptNov
Jan
Feb
Aug
Apr Dec
Entered from Jan to Dec
A balanced tree
Average search time =? 3.5
Average search time =? 3.1
What if the months
are entered in
alphabetical order?
AST = 6.5
Adelson-Velskii-Landis (AVL) Trees (1962)
【 Definition】 An empty binary tree(h=-1) is height balanced,If T is a
nonempty binary tree with TL and TR as its left and right subtrees,
then T is height balanced iff
(1) TL and TR are height balanced,and
(2) | hL? hR |? 1 where hL and hR are the heights of TL and TR,
respectively.
【 Definition】 The balance factor BF( node ) = hL? hR,In an AVL
tree,BF( node ) =?1,0,or 1.
5
82
1 4
3
7
7
82
1 4
3 5
4
5
6
3
2
1 7
〖 Example〗 Input the months Mar
Mar
0?1
May
May
0
Nov
Nov
0
1
2
May
0?1
Nov
0
2?1
Mar
0
0
The trouble maker Nov is in the right subtree’s right
subtree of the trouble finder Mar,Hence it is called an RR
rotation.
In general:
A
1
B
0
BL BR
AL
RR
Insertion RRRotation
A
2
B
1
BL BR
AL
BL
B
0
A
AL
BR
0
Single rotation
Aug
May
0?1
Nov
0
2?1
Mar
01
Aug
0
Apr
Apr
0
1
2
LL
Rotation May
0?1
Nov
0
2?1
Aug
01?
Mar
0
0
Apr
0
In general:
A
1
B
0
BRBL
AR
LL
Insertion
A
2
B
1
BRBL
AR
LL
Rotation
B
0
A
AR
BL
BR
0
May
0?1
Nov
0
2?1
Aug
01?
Mar
0
0
Apr
0
Jan
Jan
0
1
2
LR
Rotation
Mar
0?1
May
0?1
2?1
Aug
01
0
Jan
0
0
Apr
0
Nov
0
In general:
A
1
B
0
BL
ARC
0
CRCL
LR
Insertion A
2
B
1
BL
ARC
1
CRCL
OR
LR
Rotation
BL AR
C
0
A
1 or 0
CR
B
0 or 1
CL
OR
Double Rotation
34/38
Dec July
Mar
0?1
May
0?1
2?1
Aug
01?
Jan
0
Apr
0
Nov
0
July
0
Dec
0
Feb
Feb
0
1
1
2 RL
Rotation
July
0
Dec
0?1
Aug
01
2?1
Jan
01
0
1
Feb
0
0
Apr
0
Mar
0?1
May
0?1
2?1
Nov
0
In general:
A
1
B
0
BR
AL C
0
CL CR
RL
Insertion A
2
B
1
BR
AL C
1
CL CR
OR
RL
Rotation
BRAL
C
0
A
0 or 1
CL
B
1 or 0
CR
OR
July
0
Dec
0?1
Aug
01
2?1
Jan
01
0
1
Feb
0
0
Apr
0
Mar
0?1
May
0?1
2?1
Nov
0
June Oct Sept
June
0
1
Nov
0
Dec
0?1
Aug
1
2?1
Feb
0
1
July
1
Apr
0
Mar
0
May
1
June
0
Jan
0
Oct
0
1
2
1
1
Oct
0
Dec
0?1
Aug
1
2?1
Feb
0
1
July
1
Apr
0
Mar
0
Nov
0
June
0
Jan
0
May
0
Sept
0
1
1
1
1
Note,Several bf’s
might be changed even if
we don’t need to reconstruct
the tree.
Another option is to keep a height field for each node.
AvlTree Insert( ElementType X,AvlTree T )
{ if ( T == NULL ) { /* Create and return a one-node tree */
T = malloc( sizeof( struct AvlNode ) );
if ( T == NULL ) FatalError( "Out of space!!!" );
else { T->Element = X; T->Height = 0; T->Left = T->Right = NULL; }
} /* End creating a one-node tree */
else if ( X < T->Element ) { /* handle left insertion */
T->Left = Insert( X,T->Left );
if ( Height( T->Left ) - Height( T->Right ) == 2 ) /* not balanced */
if ( X < T->Left->Element ) /* LL Rotation */
T = SingleRotateWithLeft( T );
else /* LR Rotation */
T = DoubleRotateWithLeft( T ); } /* End left */
else if( X > T->Element ) { /* handle right insertion */
T->Right = Insert( X,T->Right );
if ( Height( T->Right ) - Height( T->Left ) == 2 ) /* not balanced */
if ( X > T->Right->Element ) /* RR Rotation */
T = SingleRotateWithRight( T );
else /* RL Rotation */
T = DoubleRotateWithRight( T ); } /* End right */
/* Else X is in the tree already; we'll do nothing */
T->Height = Max( Height( T->Left ),Height( T->Right ) ) + 1;
return T;
}
Let nh be the minimum number of nodes in a height balanced tree of height
h,Then the tree must look like
A
h?2 h?1
A
h?2h?1OR? nh = nh?1 + nh?2 + 1
Fibonacci numbers,
F0 = 0,F1 = 1,Fi = Fi?1 + Fi?2 for i > 1
nh = Fh+2? 1,for h? 0
Fibonacci number theory gives that i
iF


2
51
5
1
)( l n12 51
5
1 2 nOhn h
h


Interpolation Search,
Find key from a sorted list f [ l ].key,f [ l+1 ].key,...,f [ u ].key.
f[l].key
f[u].key
l u
key
i?
n elements
li
k eylfk ey
n
k eylfk eyuf
].[].[].[
k e ylfk e yuf
nk e ylfk e yli
].[].[
)].[(

If ( f [ i ].key < key ) l = i ;
Else u = i ;
Search by Formula
§ 4 Hash
General Idea
Symbol Table ( == Dictionary),:= { < name,attribute > }
〖 Example〗 In Oxford English dictionary
name = since
attribute = a list of meanings
M[0] = after a date,event,etc.
M[1] = seeing that (expressing
reason)
…… ……
〖 Example〗 In a symbol table for a compiler
name = identifier (e.g,int)
attribute = a list of lines that use the identifier,and some
other fields
Objects,A set of name-attribute pairs,where the names are
unique
Operations:
SymTab Create(TableSize)
Boolean IsIn(symtab,name)
Attribute Find(symtab,name)
SymTab Insert(symtab,name,attr)
SymTab Delete(symtab,name)
Symbol Table ADT:
Hash Tables
[0] [1] … … [s?1]
… …ht [ 0 ]
… …ht [ 1 ]
… …ht [b?1]
… …… …
b buckets
s slots
For each identifier x,we
define a hash function
f ( x ) = position of x in ht[ ] (i.e,
the index of the bucket
that contains x )
T,:= total number of distinct
possible values for x
n,:= total number of identifiers
in ht[ ]
identifier density,:= n / T
loading density?,:= n / (s b)
A collision occurs when we hash two nonidentical identifiers
into the same bucket,i.e,f ( i1 ) = f ( i2 ) when i1? i2,?
An overflow occurs when we hash a new identifier into a full
bucket.
〖 Example〗 Mapping n = 10 C library functions into a
hash table ht[ ] with b = 26 buckets and s = 2.
Slot 1Slot 0
0
1
2
3
4
5
6
……
25
Loading density? =?10 / 52 = 0.19
To map the letters a ~ z to 0 ~ 25,
we may define f ( x ) =?x [ 0 ]a?
acos
acos
define definefloat
float
exp expchar
char
atan
atan
ceil
ceil
floor floorclock ctime
Without overflow,
Tsearch = Tinsert = Tdelete = O( 1 )
Properties of f,
f ( x ) must be easy to compute and minimizes the number
of collisions.
f ( x ) should be unbiased,That is,for any x and any i,we
have that Probability( f ( x ) = i ) = 1 / b,Such kind of a
hash function is called a uniform hash function.
Hash Function
f ( x ) = x % TableSize ; /* if x is an integer */
What if TableSize = 10 and x?s are all end in zero?
TableSize = prime number ---- good for random
integer keys
f ( x ) = (? x[i]) % TableSize ; /* if x is a string */
〖 Example〗 TableSize = 10,007 and string length of x? 8.
If x[i]? [0,127],then f ( x )? [0,? ]1016
f ( x ) = (x[0]+ x[1]*27 + x[2]*272) % TableSize ;
Total number of combinations = 263 = 17,576
Actual number of combinations < 3000
// f ( x ) = x % TableSize
Index Hash1( const char *key,int TableSize )
{
unsigned int HashVal = 0;
/* 1*/ while( *key != '\0' )
/* 2*/ HashVal +=*key++;
/* 3*/ return HashVal % TableSize;
}
// f ( x ) = (? x[i]) % TableSize
Index Hash2( const char *key,int TableSize )
{
return (key[0]+27*key[1]+729*key[2]) % TableSize;
}
f ( x ) = (? x[N? i? 1] * 32i ) % TableSize ;
………
…x[0] x[1] x[N-2] x[N-1]
Index Hash3( const char *x,int TableSize )
{
unsigned int HashVal = 0;
/* 1*/ while( *x != '\0' )
/* 2*/ HashVal = ( HashVal << 5 ) + *x++;
/* 3*/ return HashVal % TableSize;
}
If x is too long (e.g,street address),the early
characters will be left-shifted out of place.
Carefully
select some
characters
from x.
Construction of Hash function
Directly addressing method
Digital analysis method
Mid-square method
Folding method
Division and residue method
Random number method
Linear functions choosing Hash function as
a key word
H(key) = key or H(key) = a? key + b
Suitable for:
Size of address set = = size of keyword set
where a and b are constants
Directly addressing method
Digital analysis method
E.g,80 records with keyword of 8 bits decimal number,and hash address of 2
bits decimal number
8 1 3 4 6 5 3 2
8 1 3 7 2 2 4 2
8 1 3 8 7 4 2 2
8 1 3 0 1 3 6 7
8 1 3 2 2 8 1 7
8 1 3 3 8 9 6 7
8 1 3 6 8 5 3 7
8 1 4 1 9 3 5 5
…..
…..

Analyze:?8 only
1 only
3,4 only
2,7,5 only
nearly random distribution of
numbers
Then,randomly get two in and hash
them with the other two to form hash
address
Supposing each keyword in the keyword set comprising s digitals
(u1,u2,…,u s),analyze the whole keyword set,and extract several
average-distributed bits of them or their composition as a address.
This is suitable when each number's occurring frequency in the
keyword set can be pre-estimated.
Get several bits in middle of keyword’s square value
as a storage address,The goal of get keyword’s square
value is to expand the difference,meanwhile the bits in
the middle of square value can be affected by every bits
in the whole keyword,
Suitable for,Each bit in the keyword has a high
occurring frequency of certain number,
Mid-square method
After dividing keyword into several parts,get their accumulation
as a hash address,2 methods of accumulation:
Shift accumulation,align lower bits and accumulate;
Boundary accumulation,Folding and forth one end along the
partition line,align and accumulate.
Suitable for keywords with many digital bits.
Folding method
E.g,Keyword:0442205864,hash address of 4 bits
5 8 6 4
4 2 2 0
0 4
1 0 0 8 8
H(key)=0088
Shift
accumulation
5 8 6 4
0 2 2 4
0 4
6 0 9 2
H(key)=6092
Boundary
accumulation
Division and residue method
Supposing hash function:
H(key) = key MOD p ( p≤m )
where m is the table length
p is a prime number no more than m
or
doesn?t contain factor lower than 20
Given a keyword gourp,12,39,18,24,33,21,
if p=9,then values of corresponding hash
function are:3,3,0,6,6,3
So,if p contains a factor 3,then all keywords
with factor 3 can be mapped to address that is
multiple of 3,which increases the possibility of
“attacks”,
E.g.
Why restrict p?
Random number method
Supposing a hash funciton:
H(key) = Random(key)
Where Random is a hypocritical random function.
This method is used for constructing hash functions for
keywords with different lengths.
Factors considered when choosing hash functions:
Time needed to calculate hash functions
Length of keywords
Length of hash table (range of hash address)
Keywords? distribution
Searching frequency of records
Separate Chaining
---- keep a list of all keys that hash to the same value
0
1
2
3
4
5
6
11
19 8268
55
14
36?01
23
Key as follows:{ 19,01,23,14,55,68,11,82,36 }
Hash function is H(key)=key MOD 7
Type declaration for separate chaining hash table
struct ListNode;
typedef struct ListNode *Position;
struct HashTbl;
typedef struct HashTbl *HashTable;
struct ListNode {
ElementType Element;
Position Next;
};
typedef Position List;
/* List *TheList will be an array of lists,allocated later */
/* The lists use headers (for simplicity),*/
/* though this wastes space */
struct HashTbl {
int TableSize;
List *TheLists;
};
HashTable InitializeTable( int TableSize )
{ HashTable H;
int i;
if ( TableSize < MinTableSize ) {
Error( "Table size too small" ); return NULL;
}
H = malloc( sizeof( struct HashTbl ) ); /* Allocate table */
if ( H == NULL ) FatalError( "Out of space!!!" );
H->TableSize = NextPrime( TableSize ); /* Better be prime */
H->TheLists = malloc( sizeof( List ) * H->TableSize ); /*Array of lists*/
if ( H->TheLists == NULL ) FatalError( "Out of space!!!" );
for( i = 0; i < H->TableSize; i++ ) { /* Allocate list headers */
H->TheLists[ i ] = malloc( sizeof( struct ListNode ) ); /* Slow! */
if ( H->TheLists[ i ] == NULL ) FatalError( "Out of space!!!" );
else H->TheLists[ i ]->Next = NULL;
}
return H;
}
Create an empty table
H TheListsTableSize …………
Find a key from a hash table
Position Find ( ElementType Key,HashTable H )
{
Position P;
List L;
L = H->TheLists[ Hash( Key,H->TableSize ) ];
P = L->Next;
while( P != NULL && P->Element != Key ) /* Probably need strcmp */
P = P->Next;
return P;
}
Your hash
function
Identical to the code to perform a
Find for general lists -- List ADT
Insert a key into a hash table
void Insert ( ElementType Key,HashTable H )
{
Position Pos,NewCell;
List L;
Pos = Find( Key,H );
if ( Pos == NULL ) { /* Key is not found,then insert */
NewCell = malloc( sizeof( struct ListNode ) );
if ( NewCell == NULL ) FatalError( "Out of space!!!" );
else {
L = H->TheLists[ Hash( Key,H->TableSize ) ]; /*Again!*/
NewCell->Next = L->Next;
NewCell->Element = Key; /* Probably need strcpy! */
L->Next = NewCell;
}
}
}
Tip,Make the TableSize about as large as the number of keys
expected (i.e,to make the loading density factor1).
Open Addressing
---- find another empty cell to solve collision (avoiding pointers)
Algorithm,insert key into an array of hash table
{
index = hash(key);
initialize i = 0 ------ the counter of probing;
while ( collision at index ) {
index = ( hash(key) + f(i) ) % TableSize;
if ( table is full ) break;
else i ++;
}
if ( table is full )
ERROR (“No space left”);
else
insert key at index;
}
Collision
resolving
function.
f(0) = 0.
Tip,Generally? < 0.5.
1,Linear Probing f ( i ) = i ; /* a linear function */
〖 Example〗 Mapping n = 11 C library functions into a
hash table ht[ ] with b = 26 buckets and s = 1.
acos atoi char define exp
ceil cos float atol floor ctime
bucket x
0
1
2
3
4
5
6
7
8
9
10
… …
25
search time
acos 1
atoi 2
char 1
define 1
exp 1
ceil 4
cos 5
float 3
atol 9
floor 5
ctime 9
Loading density? = 11 / 26 = 0.42
Average search time = 41 / 11 = 3.73
Although p is small,
the worst case can be
LARGE.
Analysis of the linear probing show
that the expected number of probes


s e a r c h e s s u c c e s s f u lf o r )1(
s e a r c h e s ulu n s u c c e s s f a n d i n s e r t i o n sf o r )1(
1
1
2
1
)1(
1
2
1
2
p = 1.36
2,Quadratic Probing f ( i ) = i 2 ; /* a quadratic function */
【 Theorem】 If quadratic probing is used,and the table size is prime,
then a new element can always be inserted if the table is at least half
empty.
Proof,Just prove that the first?TableSize/2?alternative locations are
all distinct,That is,for any 0 < i? jTableSize/2?,we have
( h(x) + i 2 ) % TableSize? ( h(x) + j 2 ) % TableSize
Suppose,h(x) + i 2 = h(x) + j 2 ( mod TableSize )
then,i 2 = j 2 ( mod TableSize )
( i + j ) ( i? j ) = 0 ( mod TableSize )
TableSize is prime either ( i + j ) or ( i? j ) is divisible by TableSize
Contradiction !
For any x,it has? TableSize/2?distinct locations into which it can go,
If at most?TableSize/2?positions are taken,then an empty spot can
always be found.
Note,If the table size is a prime of the form 4k + 3,then the
quadratic probing f(i) =? i 2 can probe the entire table.
//HashQuad.h
typedef unsigned int index;
typedef index position;
struct HashTbl;
typedef struct HashTbl * HashTable;
HashTable InitializeTable(int TableSize);
void DestroyTable(HashTable h);
position Find(ElementType key,HashTable h);
void Insert(ElementType key,HashTable h);
ElementType Retrieve(position p,HastTable h);
HashTable Rehash(HashTable h);
//place in the implimentation file
enum KindOfEntry{Legitimate,Empty,Deleted};
struct HashEntry
{
ElementType Element;
enum KindOfEntry Info;
};
typedef struct HashEntry Cell;
struct HashTbl
{
int TableSize;
cell *TheCells;
};
// Initialze HashTble
HashTble InitialzeTable(int TableSize)
{
HashTable h;int I;
if(TableSize<MinTableSize)
{Error(“table size is too small”);return NULL;}
h=malloc(sizeof(struct HashTbl));
if(h==NULL)FatalError(“out of space!”);
h->TableSize=NextPrime(TableSize);
h->TheCells=malloc(sizeof(cell)*h->TableSize);
if(h->TheCells==NULL) FatalError(“out of space!”);
for(i=0;i<h->TableSize;i++)
h->TheCell[i].Info=Empty;
return h;
}
Note,If the table size is a prime of the form 4k + 3,then the
quadratic probing f(i) =? i 2 can probe the entire table.
Read Figures 7.15 - 7.16 for detailed
representations and implementations of initialization.
Position Find ( ElementType Key,HashTable H )
{ Position CurrentPos;
int CollisionNum;
CollisionNum = 0;
CurrentPos = Hash( Key,H->TableSize );
while( H->TheCells[ CurrentPos ].Info != Empty &&
H->TheCells[ CurrentPos ].Element != Key ) {
CurrentPos += 2 * ++CollisionNum? 1;
if ( CurrentPos >= H->TableSize ) CurrentPos? = H->TableSize;
}
return CurrentPos;
}
void Insert ( ElementType Key,HashTable H )
{
Position Pos;
Pos = Find( Key,H );
if ( H->TheCells[ Pos ].Info != Legitimate ) { /* OK to insert here */
H->TheCells[ Pos ].Info = Legitimate;
H->TheCells[ Pos ].Element = Key; /* Probably need strcpy */
}
}
Question,How to delete a key?
Note:?Insertion will be seriously slowed down if there are too
many deletions intermixed with insertions.
Although primary clustering is solved,secondary clustering
occurs – that is,keys that hash to the same position will probe
the same alternative cells.
3,Double Hashing
f ( i ) = i * hash2( x ); /* hash2( x ) is the 2nd hash function */
make sure that all cells can be probed.?hash2( x )? 0 ;
Tip,hash2( x ) = R – ( x % R ) with R a prime smaller than
TableSize,will work well.
Note:?If double hashing is correctly implemented,
simulations imply that the expected number of
probes is almost the same as for a random collision
resolution strategy.
Quadratic probing does not require the use of a
second hash function and is thus likely to be
simpler and faster in practice.
Rehashing
Build another table that is about twice as
big;
Scan down the entire original hash table for
non-deleted elements;
Use a new function to hash those elements
into the new table.
If there are N keys in the table,then T (N) = O(N)
Question,When to rehash?
Answer:
As soon as the table is half full
When an insertion fails
When the table reaches a certain load factor
Note,Usually there should have been N/2 insertions
before rehash,so O(N) rehash only adds a
constant cost to each insertion.
However,in an interactive system,the
unfortunate user whose insertion caused a
rehash could see a slowdown.
HashTable Rehash (HashTable H )
{
int i,OldSize;Cell *OldCells;
OldCells=H->TheCells;OldSize=H->TableSize;
H=InitializeTable(2*OldSize); //get a new empty table
for(i=0;i<OldSize;i++)
if(OldCells[i].Info==Legitimate);
Insert(OldCells[i].Element,H);
free(OldCells);
return H;
}