Introduction to Algorithms
6.046J/18.401J/SMA5503
Lecture 7
Prof. Charles E. Leiserson
Introduction to Algorithms Day 11 L7.2? 2001 by Charles E. Leiserson
Symbol-table problem
Symbol table T holding n records:
key[x]
key[x]
record
x
Other fields
containing
satellite data
Operations on T:
? INSERT(T, x)
? DELETE(T, x)
? SEARCH(T, k)
How should the data structure T be organized?
Introduction to Algorithms Day 11 L7.3? 2001 by Charles E. Leiserson
Direct-access table
IDEA: Suppose that the set of keys is K ? {0,
1, …, m–1}, and keys are distinct. Set up an
array T[0 . . m–1]:
T[k] =
x if k∈ K and key[x] = k,
NIL otherwise.
Then, operations take Θ(1) time.
Problem: The range of keys can be large:
? 64-bit numbers (which represent
18,446,744,073,709,551,616 different keys),
? character strings (even larger!).
Introduction to Algorithms Day 11 L7.4? 2001 by Charles E. Leiserson
As each key is inserted, h maps it to a slot of T.
Hash functions
Solution: Use a hash function h to map the
universe U of all keys into
{0, 1, …, m–1}:
U
K
k
1
k
2
k
3
k
4
k
5
0
m–1
h(k
1
)
h(k
4
)
h(k
2
)
h(k
3
)
When a record to be inserted maps to an already
occupied slot in T, a collision occurs.
T
= h(k
5
)
Introduction to Algorithms Day 11 L7.5? 2001 by Charles E. Leiserson
Resolving collisions by
chaining
? Records in the same slot are linked into a list.
h(49) = h(86) = h(52) = i
T
49
49
86
86
52
52
i
Introduction to Algorithms Day 11 L7.6? 2001 by Charles E. Leiserson
Analysis of chaining
We make the assumption of simple uniform
hashing:
? Each key k ∈ K of keys is equally likely to
be hashed to any slot of table T, independent
of where other keys are hashed.
Let n be the number of keys in the table, and
let m be the number of slots.
Define the load factor of T to be
α = n/m
= average number of keys per slot.
Introduction to Algorithms Day 11 L7.7? 2001 by Charles E. Leiserson
Search cost
Expected time to search for a record with
a given key = Θ(1 + α).
apply hash
function and
access slot
search
the list
Expected search time = Θ(1) if α = O(1),
or equivalently, if n = O(m).
Introduction to Algorithms Day 11 L7.8? 2001 by Charles E. Leiserson
Choosing a hash function
The assumption of simple uniform hashing
is hard to guarantee, but several common
techniques tend to work well in practice as
long as their deficiencies can be avoided.
Desirata:
? A good hash function should distribute the
keys uniformly into the slots of the table.
? Regularity in the key distribution should
not affect this uniformity.
Introduction to Algorithms Day 11 L7.9? 2001 by Charles E. Leiserson
h(k)
Division method
Assume all keys are integers, and define
h(k) = k mod m.
Extreme deficiency: If m = 2
r
, then the hash
doesn’t even depend on all the bits of k:
? If k = 1011000111011010
2
and r = 6, then
h(k) = 011010
2
.
Deficiency: Don’t pick an m that has a small
divisor d. A preponderance of keys that are
congruent modulo d can adversely affect
uniformity.
Introduction to Algorithms Day 11 L7.10? 2001 by Charles E. Leiserson
Division method (continued)
h(k) = k mod m.
Pick m to be a prime not too close to a power
of 2 or 10 and not otherwise used prominently
in the computing environment.
Annoyance:
? Sometimes, making the table size a prime is
inconvenient.
But, this method is popular, although the next
method we’ll see is usually superior.
Introduction to Algorithms Day 11 L7.11? 2001 by Charles E. Leiserson
Multiplication method
Assume that all keys are integers, m = 2
r
, and our
computer has w-bit words. Define
h(k) = (A·k mod 2
w
) rsh (w – r),
where rsh is the “bit-wise right-shift” operator
and A is an odd integer in the range 2
w–1
< A < 2
w
.
? Don’t pick A too close to 2
w
.
? Multiplication modulo 2
w
is fast.
? The rsh operator is fast.
Introduction to Algorithms Day 11 L7.12? 2001 by Charles E. Leiserson
4
0
35
26
17
Modular wheel
Multiplication method
example
h(k) = (A·k mod 2
w
) rsh (w – r)
Suppose that m = 8 = 2
3
and that our computer
has w = 7-bit words:
1 0 1 1 0 0 1
× 1 1 0 1 0 1 1
1 0 0 1 0 1 0 0 1 1 0 0 1 1
= A
= k
h(k)
A
.
2A
.
3A
.
Introduction to Algorithms Day 11 L7.13? 2001 by Charles E. Leiserson
Dot-product method
Randomized strategy:
Let m be prime. Decompose key k into r + 1
digits, each with value in the set {0, 1, …, m–1}.
That is, let k = ?k
0
, k
1
, …, k
m–1
?, where 0 ≤ k
i
< m.
Pick a = ?a
0
, a
1
, …, a
m–1
? where each a
i
is chosen
randomly from {0, 1, …, m–1}.
mkakh
r
i
iia
mod)(
0
∑
=
=Define .
? Excellent in practice, but expensive to compute.
Introduction to Algorithms Day 11 L7.14? 2001 by Charles E. Leiserson
Resolving collisions by open
addressing
No storage is used outside of the hash table itself.
? Insertion systematically probes the table until an
empty slot is found.
? The hash function depends on both the key and
probe number:
h : U × {0, 1, …, m–1} → {0, 1, …, m–1}.
? The probe sequence ?h(k,0), h(k,1), …, h(k,m–1)?
should be a permutation of {0, 1, …, m–1}.
? The table may fill up, and deletion is difficult (but
not impossible).
Introduction to Algorithms Day 11 L7.15? 2001 by Charles E. Leiserson
204
Example of open addressing
Insert key k = 496:
0. Probe h(496,0)
586
133
481
T
0
m–1
collision
Introduction to Algorithms Day 11 L7.16? 2001 by Charles E. Leiserson
Example of open addressing
Insert key k = 496:
0. Probe h(496,0)
586
133
204
481
T
0
m–1
1. Probe h(496,1)
collision
Introduction to Algorithms Day 11 L7.17? 2001 by Charles E. Leiserson
Example of open addressing
Insert key k = 496:
0. Probe h(496,0)
586
133
204
481
T
0
m–1
1. Probe h(496,1)
insertion
496
2. Probe h(496,2)
Introduction to Algorithms Day 11 L7.18? 2001 by Charles E. Leiserson
Example of open addressing
Search for key k = 496:
0. Probe h(496,0)
586
133
204
481
T
0
m–1
1. Probe h(496,1)
496
2. Probe h(496,2)
Search uses the same probe
sequence, terminating suc-
cessfully if it finds the key
and unsuccessfully if it encounters an empty slot.
Introduction to Algorithms Day 11 L7.19? 2001 by Charles E. Leiserson
Probing strategies
Linear probing:
Given an ordinary hash function h′(k), linear
probing uses the hash function
h(k,i) = (h′(k) + i) mod m.
This method, though simple, suffers from primary
clustering, where long runs of occupied slots build
up, increasing the average search time. Moreover,
the long runs of occupied slots tend to get longer.
Introduction to Algorithms Day 11 L7.20? 2001 by Charles E. Leiserson
Probing strategies
Double hashing
Given two ordinary hash functions h
1
(k) and h
2
(k),
double hashing uses the hash function
h(k,i) = (h
1
(k) + i?h
2
(k)) mod m.
This method generally produces excellent results,
but h
2
(k) must be relatively prime to m. One way
is to make m a power of 2 and design h
2
(k) to
produce only odd numbers.
Introduction to Algorithms Day 11 L7.21? 2001 by Charles E. Leiserson
Analysis of open addressing
We make the assumption of uniform hashing:
? Each key is equally likely to have any one of
the m! permutations as its probe sequence.
Theorem. Given an open-addressed hash
table with load factor α = n/m < 1, the
expected number of probes in an unsuccessful
search is at most 1/(1–α).
Introduction to Algorithms Day 11 L7.22? 2001 by Charles E. Leiserson
Proof of the theorem
Proof.
? At least one probe is always necessary.
? With probability n/m, the first probe hits an
occupied slot, and a second probe is necessary.
? With probability (n–1)/(m–1), the second probe
hits an occupied slot, and a third probe is
necessary.
? With probability (n–2)/(m–2), the third probe
hits an occupied slot, etc.
Observe that
α=<
?
?
m
n
im
in
for i = 1, 2, …, n.
Introduction to Algorithms Day 11 L7.23? 2001 by Charles E. Leiserson
Proof (continued)
Therefore, the expected number of probes is
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
+?
+
?
?
+
?
?
++LL
1
1
1
2
2
1
1
1
11
nmm
n
m
n
m
n
( )( )( )()
α
α
ααα
αααα
?
=
=
++++≤
++++≤
∑
∞
=
1
1
1
1111
0
32
i
i
L
LL
.
The textbook has a
more rigorous proof.
Introduction to Algorithms Day 11 L7.24? 2001 by Charles E. Leiserson
Implications of the theorem
? If α is constant, then accessing an open-
addressed hash table takes constant time.
? If the table is half full, then the expected
number of probes is 1/(1–0.5) = 2.
? If the table is 90% full, then the expected
number of probes is 1/(1–0.9) = 10.