Introduction to Algorithms
6.046J/18.401J/SMA5503
Lecture 6
Prof. Erik Demaine
Introduction to Algorithms Day 9 L6.2? 2001 by Charles E. Leiserson
Order statistics
Select the ith smallest of n elements (the
element with rank i).
? i = 1: minimum;
? i = n: maximum;
? i = ?(n+1)/2? or ?(n+1)/2?: median.
Naive algorithm: Sort and index ith element.
Worst-case running time = Θ(n lg n) + Θ(1)
= Θ(n lg n),
using merge sort or heapsort (not quicksort).
Introduction to Algorithms Day 9 L6.3? 2001 by Charles E. Leiserson
Randomized divide-and-
conquer algorithm
RAND-SELECT(A, p, q, i) ? ith smallest of A[p ..q]
if p = q then return A[p]
r ← RAND-PARTITION(A, p, q)
k ← r – p + 1 ? k = rank(A[r])
if i = k then return A[r]
if i < k
then return RAND-SELECT(A, p, r – 1, i )
else return RAND-SELECT(A, r + 1, q, i – k )
≤ A[r]
≤ [r]
≥ A[r]
≥ [r]
rpq
k
Introduction to Algorithms Day 9 L6.4? 2001 by Charles E. Leiserson
Example
pivot
i = 7
6
6
10
10
13
13
5
5
8
8
3
3
2
2
11
11
k = 4
Select the 7 – 4 = 3rd smallest recursively.
Select the i = 7th smallest:
2
2
5
5
3
3
6
6
8
8
13
13
10
10
11
11
Partition:
Introduction to Algorithms Day 9 L6.5? 2001 by Charles E. Leiserson
Intuition for analysis
Lucky:
1
01log
9/10
== nn
CASE 3
T(n)= T(9n/10) + Θ(n)
= Θ(n)
Unlucky:
T(n)= T(n –1) + Θ(n)
= Θ(n
2
)
arithmetic series
Worse than sorting!
(All our analyses today assume that all elements
are distinct.)
Introduction to Algorithms Day 9 L6.6? 2001 by Charles E. Leiserson
Analysis of expected time
Let T(n) = the random variable for the running
time of RAND-SELECT on an input of size n,
assuming random numbers are independent.
For k = 0, 1, …, n–1, define the indicator
random variable
X
k
=
1 if PARTITION generates a k : n–k–1 split,
0 otherwise.
The analysis follows that of randomized
quicksort, but it’s a little different.
Introduction to Algorithms Day 9 L6.7? 2001 by Charles E. Leiserson
Analysis (continued)
T(n) =
T(max{0, n–1}) + Θ(n) if 0:n–1 split,
T(max{1, n–2}) + Θ(n) if 1:n–2 split,
M
T(max{n–1, 0}) + Θ(n) if n–1 : 0 split,
()
∑
?
=
Θ+??=
1
0
)(})1,(max{
n
k
k
nknkTX .
To obtain an upper bound, assume that the ith
element always falls in the larger side of the
partition:
Introduction to Algorithms Day 9 L6.8? 2001 by Charles E. Leiserson
Calculating expectation
()
?
?
?
?
?
?
Θ+??=
∑
?
=
1
0
)(})1,(max{)]([
n
k
k
nknkTXEnTE
Take expectations of both sides.
Introduction to Algorithms Day 9 L6.9? 2001 by Charles E. Leiserson
Calculating expectation
()
[]
∑
∑
?
=
?
=
Θ+??=
?
?
?
?
?
?
Θ+??=
1
0
1
0
)(})1,(max{
)(})1,(max{)]([
n
k
k
n
k
k
nknkTXE
nknkTXEnTE
Linearity of expectation.
Introduction to Algorithms Day 9 L6.10? 2001 by Charles E. Leiserson
Calculating expectation
()
[]
[][ ]
∑
∑
∑
?
=
?
=
?
=
Θ+???=
Θ+??=
?
?
?
?
?
?
Θ+??=
1
0
1
0
1
0
)(})1,(max{
)(})1,(max{
)(})1,(max{)]([
n
k
k
n
k
k
n
k
k
nknkTEXE
nknkTXE
nknkTXEnTE
Independence of X
k
from other random
choices.
Introduction to Algorithms Day 9 L6.11? 2001 by Charles E. Leiserson
Calculating expectation
()
[]
[][ ]
[]
∑∑
∑
∑
∑
?
=
?
=
?
=
?
=
?
=
Θ+??=
Θ+???=
Θ+??=
?
?
?
?
?
?
Θ+??=
1
0
1
0
1
0
1
0
1
0
)(
1
})1,(max{
1
)(})1,(max{
)(})1,(max{
)(})1,(max{)]([
n
k
n
k
n
k
k
n
k
k
n
k
k
n
n
knkTE
n
nknkTEXE
nknkTXE
nknkTXEnTE
Linearity of expectation; E[X
k
] = 1/n .
Introduction to Algorithms Day 9 L6.12? 2001 by Charles E. Leiserson
Calculating expectation
()
[]
[][ ]
[]
[]
? ?
)()(
2
)(
1
})1,(max{
1
)(})1,(max{
)(})1,(max{
)(})1,(max{)]([
1
2/
1
0
1
0
1
0
1
0
1
0
nkTE
n
n
n
knkTE
n
nknkTEXE
nknkTXE
nknkTXEnTE
n
nk
n
k
n
k
n
k
k
n
k
k
n
k
k
Θ+≤
Θ+??=
Θ+???=
Θ+??=
?
?
?
?
?
?
Θ+??=
∑
∑∑
∑
∑
∑
?
=
?
=
?
=
?
=
?
=
?
=
Upper terms
appear twice.
Introduction to Algorithms Day 9 L6.13? 2001 by Charles E. Leiserson
Hairy recurrence
[]
? ?
)()(
2
)]([
1
2/
nkTE
n
nTE
n
nk
Θ+=
∑
?
=
Prove: E[T(n)] ≤ cn for constant c > 0 .
Use fact:
? ?
2
1
2/
8
3
nk
n
nk
∑
?
=
≤
(exercise).
? The constant c can be chosen large enough
so that E[T(n)] ≤ cn for the base cases.
(But not quite as hairy as the quicksort one.)
Introduction to Algorithms Day 9 L6.14? 2001 by Charles E. Leiserson
Substitution method
[]
? ?
)(
2
)(
1
2/
nck
n
nTE
n
nk
Θ+≤
∑
?
=
Substitute inductive hypothesis.
Introduction to Algorithms Day 9 L6.15? 2001 by Charles E. Leiserson
Substitution method
[]
??
)(
8
32
)(
2
)(
2
1
2/
nn
n
c
nck
n
nTE
n
nk
Θ+
?
?
?
?
?
?
≤
Θ+≤
∑
?
=
Use fact.
Introduction to Algorithms Day 9 L6.16? 2001 by Charles E. Leiserson
Substitution method
Express as desired – residual.
[]
??
?
?
?
?
?
?
Θ??=
Θ+
?
?
?
?
?
?
≤
Θ+≤
∑
?
=
)(
4
)(
8
32
)(
2
)(
2
1
2/
n
cn
cn
nn
n
c
nck
n
nTE
n
nk
Introduction to Algorithms Day 9 L6.17? 2001 by Charles E. Leiserson
Substitution method
[]
??
cn
n
cn
cn
nn
n
c
nck
n
nTE
n
nk
≤
?
?
?
?
?
?
Θ??=
Θ+
?
?
?
?
?
?
≤
Θ+≤
∑
?
=
)(
4
)(
8
32
)(
2
)(
2
1
2/
if c is chosen large enough so
that cn/4 dominates the Θ(n).
,
Introduction to Algorithms Day 9 L6.18? 2001 by Charles E. Leiserson
Summary of randomized
order-statistic selection
? Works fast: linear expected time.
? Excellent algorithm in practice.
? But, the worst case is very bad: Θ(n
2
).
Q. Is there an algorithm that runs in linear
time in the worst case?
IDEA: Generate a good pivot recursively.
A. Yes, due to Blum, Floyd, Pratt, Rivest,
and Tarjan [1973].
Introduction to Algorithms Day 9 L6.19? 2001 by Charles E. Leiserson
Worst-case linear-time order
statistics
if i = k then return x
elseif i < k
then recursively SELECT the ith
smallest element in the lower part
else recursively SELECT the (i–k)th
smallest element in the upper part
SELECT(i, n)
1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.
2. Recursively SELECT the median x of the ?n/5?
group medians to be the pivot.
3. Partition around the pivot x. Let k = rank(x).
4.
Same as
RAND-
SELECT
Introduction to Algorithms Day 9 L6.20? 2001 by Charles E. Leiserson
Choosing the pivot
Introduction to Algorithms Day 9 L6.21? 2001 by Charles E. Leiserson
Choosing the pivot
1. Divide the n elements into groups of 5.
Introduction to Algorithms Day 9 L6.22? 2001 by Charles E. Leiserson
Choosing the pivot
lesser
greater
1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.
Introduction to Algorithms Day 9 L6.23? 2001 by Charles E. Leiserson
Choosing the pivot
lesser
greater
1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.
2. Recursively SELECT the median x of the ?n/5?
group medians to be the pivot.
x
Introduction to Algorithms Day 9 L6.24? 2001 by Charles E. Leiserson
Analysis
lesser
greater
x
At least half the group medians are ≤ x, which
is at least ??n/5? /2? = ?n/10? group medians.
Introduction to Algorithms Day 9 L6.25? 2001 by Charles E. Leiserson
Analysis
lesser
greater
x
At least half the group medians are ≤ x, which
is at least ??n/5? /2? = ?n/10? group medians.
? Therefore, at least 3?n/10? elements are ≤ x.
(Assume all elements are distinct.)
Introduction to Algorithms Day 9 L6.26? 2001 by Charles E. Leiserson
Analysis
lesser
greater
x
At least half the group medians are ≤ x, which
is at least ??n/5? /2? = ?n/10? group medians.
? Therefore, at least 3?n/10? elements are ≤ x.
? Similarly, at least 3?n/10? elements are ≥ x.
(Assume all elements are distinct.)
Introduction to Algorithms Day 9 L6.27? 2001 by Charles E. Leiserson
Minor simplification
? For n ≥ 50, we have 3?n/10?≥n/4.
? Therefore, for n ≥ 50 the recursive call to
SELECT in Step 4 is executed recursively
on ≤ 3n/4 elements.
? Thus, the recurrence for running time
can assume that Step 4 takes time
T(3n/4) in the worst case.
? For n < 50, we know that the worst-case
time is T(n) = Θ(1).
Introduction to Algorithms Day 9 L6.28? 2001 by Charles E. Leiserson
Developing the recurrence
if i = k then return x
elseif i < k
then recursively SELECT the ith
smallest element in the lower part
else recursively SELECT the (i–k)th
smallest element in the upper part
SELECT(i, n)
1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.
2. Recursively SELECT the median x of the ?n/5?
group medians to be the pivot.
3. Partition around the pivot x. Let k = rank(x).
4.
T(n)
Θ(n)
T(n/5)
Θ(n)
T(3n/4)
Introduction to Algorithms Day 9 L6.29? 2001 by Charles E. Leiserson
Solving the recurrence
)(
4
3
5
1
)( nnTnTnT Θ+
?
?
?
?
?
?
+
?
?
?
?
?
?
=
if c is chosen large enough to handle both the
Θ(n) and the initial conditions.
cn
ncncn
ncn
ncncnnT
≤
?
?
?
?
?
?
Θ??=
Θ+=
Θ++≤
)(
20
1
)(
20
19
)(
4
3
5
1
)(
,
Substitution:
T(n) ≤ cn
Introduction to Algorithms Day 9 L6.30? 2001 by Charles E. Leiserson
Conclusions
? Since the work at each level of recursion
is a constant fraction (19/20) smaller, the
work per level is a geometric series
dominated by the linear work at the root.
? In practice, this algorithm runs slowly,
because the constant in front of n is large.
? The randomized algorithm is far more
practical.
Exercise: Why not divide into groups of 3?