Introduction to Algorithms
6.046J/18.401J/SMA5503
Lecture 8
Prof. Charles E. Leiserson
Introduction to Algorithms Day 12 L8.2? 2001 by Charles E. Leiserson
A weakness of hashing
Problem: For any hash function h, a set
of keys exists that can cause the average
access time of a hash table to skyrocket.
IDEA: Choose the hash function at random,
independently of the keys.
? Even if an adversary can see your code,
he or she cannot find a bad set of keys,
since he or she doesn’t know exactly
which hash function will be chosen.
? An adversary can pick all keys from
{k ∈ U : h(k) = i} for some slot i.
Introduction to Algorithms Day 12 L8.3? 2001 by Charles E. Leiserson
Universal hashing
Definition. Let U be a universe of keys, and
let H be a finite collection of hash functions,
each mapping U to {0, 1, …, m–1}. We say
H is universal if for all x, y ∈ U, where x ≠ y,
we have |{h ∈H: h(x) = h(y)}| = |H|/m.
That is, the chance
of a collision
between x and y is
1/m if we choose h
randomly from H.
H
{h : h(x) = h(y)}
|H|
m
Introduction to Algorithms Day 12 L8.4? 2001 by Charles E. Leiserson
Universality is good
Theorem. Let h be a hash function chosen
(uniformly) at random from a universal set H
of hash functions. Suppose h is used to hash
n arbitrary keys into the m slots of a table T.
Then, for a given key x, we have
E[#collisions with x] < n/m.
Introduction to Algorithms Day 12 L8.5? 2001 by Charles E. Leiserson
Proof of theorem
Proof. Let C
x
be the random variable denoting
the total number of collisions of keys in T with
x, and let
c
xy
=
1 if h(x) = h(y),
0 otherwise.
Note: E[c
xy
] = 1/m and
∑
?∈
=
}{xTy
xyx
cC .
Introduction to Algorithms Day 12 L8.6? 2001 by Charles E. Leiserson
Proof (continued)
?
?
?
?
?
?
?
?
=
∑
?∈ }{
][
xTy
xyx
cECE ? Take expectation
of both sides.
Introduction to Algorithms Day 12 L8.7? 2001 by Charles E. Leiserson
Proof (continued)
∑
∑
?∈
?∈
=
?
?
?
?
?
?
?
?
=
}{
}{
][
][
xTy
xy
xTy
xyx
cE
cECE
? Linearity of
expectation.
? Take expectation
of both sides.
Introduction to Algorithms Day 12 L8.8? 2001 by Charles E. Leiserson
Proof (continued)
∑
∑
∑
?∈
?∈
?∈
=
=
?
?
?
?
?
?
?
?
=
}{
}{
}{
/1
][
][
xTy
xTy
xy
xTy
xyx
m
cE
cECE
? E[c
xy
] = 1/m.
? Linearity of
expectation.
? Take expectation
of both sides.
Introduction to Algorithms Day 12 L8.9? 2001 by Charles E. Leiserson
Proof (continued)
m
n
m
cE
cECE
xTy
xTy
xy
xTy
xyx
1
/1
][
][
}{
}{
}{
?
=
=
=
?
?
?
?
?
?
?
?
=
∑
∑
∑
?∈
?∈
?∈
? Take expectation
of both sides.
? Linearity of
expectation.
? E[c
xy
] = 1/m.
? Algebra..
Introduction to Algorithms Day 12 L8.10? 2001 by Charles E. Leiserson
REMEMBER
THIS!
Constructing a set of
universal hash functions
Let m be prime. Decompose key k into r + 1
digits, each with value in the set {0, 1, …, m–1}.
That is, let k = ?k
0
, k
1
, …, k
r
?, where 0 ≤ k
i
< m.
Randomized strategy:
Pick a = ?a
0
, a
1
, …, a
r
? where each a
i
is chosen
randomly from {0, 1, …, m–1}.
mkakh
r
i
iia
mod)(
0
∑
=
=Define .
How big is H = {h
a
}? |H| = m
r + 1
.
Dot product,
modulo m
Introduction to Algorithms Day 12 L8.11? 2001 by Charles E. Leiserson
Universality of dot-product
hash functions
Theorem. The set H = {h
a
} is universal.
Proof. Suppose that x = ?x
0
, x
1
, …, x
r
? and y =
?y
0
, y
1
, …, y
r
? be distinct keys. Thus, they differ
in at least one digit position, wlog position 0.
For how many h
a
∈Hdo x and y collide?
)(mod
00
myaxa
r
i
ii
r
i
ii
∑∑
==
≡ .
We must have h
a
(x) = h
a
(y), which implies that
Introduction to Algorithms Day 12 L8.12? 2001 by Charles E. Leiserson
Proof (continued)
Equivalently, we have
)(mod0)(
0
myxa
r
i
iii
≡?
∑
=
or
)(mod0)()(
1
000
myxayxa
r
i
iii
≡?+?
∑
=
)(mod)()(
1
000
myxayxa
r
i
iii
∑
=
??≡?
which implies that
,
.
Introduction to Algorithms Day 12 L8.13? 2001 by Charles E. Leiserson
Fact from number theory
Theorem. Let m be prime. For any z ∈Z
m
such that z ≠ 0, there exists a unique z
–1
∈Z
m
such that
z · z
–1
≡ 1 (mod m).
Example: m = 7.
z
z
–1
1 2 3 4 5 6
1 4 5 2 3 6
Introduction to Algorithms Day 12 L8.14? 2001 by Charles E. Leiserson
Back to the proof
)(mod)()(
1
000
myxayxa
r
i
iii
∑
=
??≡?
We have
and since x
0
≠ y
0
, an inverse (x
0
– y
0
)
–1
must exist,
which implies that
,
)(mod)()(
1
00
1
0
myxyxaa
r
i
iii
?
=
??
?
?
?
?
?
?
?
?
??≡
∑
.
Thus, for any choices of a
1
, a
2
, …, a
r
, exactly
one choice of a
0
causes x and y to collide.
Introduction to Algorithms Day 12 L8.15? 2001 by Charles E. Leiserson
Proof (completed)
Q. How many h
a
’s cause x and y to collide?
A. There are m choices for each of a
1
, a
2
, …, a
r
,
but once these are chosen, exactly one choice
for a
0
causes x and y to collide, namely
myxyxaa
r
i
iii
mod)()(
1
00
1
0
?
?
?
?
?
?
?
?
??
?
?
?
?
?
?
?
?
??=
?
=
∑
.
Thus, the number of h
a
’s that cause x and y
to collide is m
r
·1 = m
r
= |H|/m.
Introduction to Algorithms Day 12 L8.16? 2001 by Charles E. Leiserson
Perfect hashing
Given a set of n keys, construct a static hash
table of size m = O(n) such that SEARCH takes
Θ(1) time in the worst case.
IDEA: Two-
level scheme
with universal
hashing at
both levels.
No collisions
at level 2!
40
40
37
37
22
22
0
1
2
3
4
5
6
26
26
ma 012345678
14
14
27
27
S
4
S
6
S
1
4
4
31
31
1
1
00
00
9
9
86
86
T
h
31
(14) = h
31
(27) = 1
Introduction to Algorithms Day 12 L8.17? 2001 by Charles E. Leiserson
Collisions at level 2
Theorem. Let H be a class of universal hash
functions for a table of size m = n
2
. Then, if we
use a random h ∈H to hash n keys into the table,
the expected number of collisions is at most 1/2.
Proof. By the definition of universality, the
probability that 2 given keys in the table collide
under h is 1/m = 1/n
2
. Since there are pairs
of keys that can possibly collide, the expected
number of collisions is
( )
2
n
2
11
2
)1(
1
2
22
<?
?
=?
?
?
?
?
?
?
n
nn
n
n
.
Introduction to Algorithms Day 12 L8.18? 2001 by Charles E. Leiserson
No collisions at level 2
Corollary. The probability of no collisions
is at least 1/2.
Thus, just by testing random hash functions
in H, we’ll quickly find one that works.
Proof. Markov’s inequality says that for any
nonnegative random variable X, we have
Pr{X ≥ t} ≤ E[X]/t.
Applying this inequality with t = 1, we find
that the probability of 1 or more collisions is
at most 1/2.
Introduction to Algorithms Day 12 L8.19? 2001 by Charles E. Leiserson
Analysis of storage
For the level-1 hash table T, choose m = n, and
let n
i
be random variable for the number of keys
that hash to slot i in T. By using n
i
2
slots for the
level-2 hash table S
i
, the expected total storage
required for the two-level scheme is therefore
() )(
1
0
2
nnE
m
i
i
Θ=
?
?
?
?
?
?
Θ
∑
?
=
,
since the analysis is identical to the analysis from
recitation of the expected running time of bucket
sort. (For a probability bound, apply Markov.)