Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine Introduction to Algorithms Day 9 L6.2? 2001 by Charles E. Leiserson Order statistics Select the ith smallest of n elements (the element with rank i). ? i = 1: minimum; ? i = n: maximum; ? i = ?(n+1)/2? or ?(n+1)/2?: median. Naive algorithm: Sort and index ith element. Worst-case running time = Θ(n lg n) + Θ(1) = Θ(n lg n), using merge sort or heapsort (not quicksort). Introduction to Algorithms Day 9 L6.3? 2001 by Charles E. Leiserson Randomized divide-and- conquer algorithm RAND-SELECT(A, p, q, i) ? ith smallest of A[p ..q] if p = q then return A[p] r ← RAND-PARTITION(A, p, q) k ← r – p + 1 ? k = rank(A[r]) if i = k then return A[r] if i < k then return RAND-SELECT(A, p, r – 1, i ) else return RAND-SELECT(A, r + 1, q, i – k ) ≤ A[r] ≤ [r] ≥ A[r] ≥ [r] rpq k Introduction to Algorithms Day 9 L6.4? 2001 by Charles E. Leiserson Example pivot i = 7 6 6 10 10 13 13 5 5 8 8 3 3 2 2 11 11 k = 4 Select the 7 – 4 = 3rd smallest recursively. Select the i = 7th smallest: 2 2 5 5 3 3 6 6 8 8 13 13 10 10 11 11 Partition: Introduction to Algorithms Day 9 L6.5? 2001 by Charles E. Leiserson Intuition for analysis Lucky: 1 01log 9/10 == nn CASE 3 T(n)= T(9n/10) + Θ(n) = Θ(n) Unlucky: T(n)= T(n –1) + Θ(n) = Θ(n 2 ) arithmetic series Worse than sorting! (All our analyses today assume that all elements are distinct.) Introduction to Algorithms Day 9 L6.6? 2001 by Charles E. Leiserson Analysis of expected time Let T(n) = the random variable for the running time of RAND-SELECT on an input of size n, assuming random numbers are independent. For k = 0, 1, …, n–1, define the indicator random variable X k = 1 if PARTITION generates a k : n–k–1 split, 0 otherwise. The analysis follows that of randomized quicksort, but it’s a little different. Introduction to Algorithms Day 9 L6.7? 2001 by Charles E. Leiserson Analysis (continued) T(n) = T(max{0, n–1}) + Θ(n) if 0:n–1 split, T(max{1, n–2}) + Θ(n) if 1:n–2 split, M T(max{n–1, 0}) + Θ(n) if n–1 : 0 split, () ∑ ? = Θ+??= 1 0 )(})1,(max{ n k k nknkTX . To obtain an upper bound, assume that the ith element always falls in the larger side of the partition: Introduction to Algorithms Day 9 L6.8? 2001 by Charles E. Leiserson Calculating expectation () ? ? ? ? ? ? Θ+??= ∑ ? = 1 0 )(})1,(max{)]([ n k k nknkTXEnTE Take expectations of both sides. Introduction to Algorithms Day 9 L6.9? 2001 by Charles E. Leiserson Calculating expectation () [] ∑ ∑ ? = ? = Θ+??= ? ? ? ? ? ? Θ+??= 1 0 1 0 )(})1,(max{ )(})1,(max{)]([ n k k n k k nknkTXE nknkTXEnTE Linearity of expectation. Introduction to Algorithms Day 9 L6.10? 2001 by Charles E. Leiserson Calculating expectation () [] [][ ] ∑ ∑ ∑ ? = ? = ? = Θ+???= Θ+??= ? ? ? ? ? ? Θ+??= 1 0 1 0 1 0 )(})1,(max{ )(})1,(max{ )(})1,(max{)]([ n k k n k k n k k nknkTEXE nknkTXE nknkTXEnTE Independence of X k from other random choices. Introduction to Algorithms Day 9 L6.11? 2001 by Charles E. Leiserson Calculating expectation () [] [][ ] [] ∑∑ ∑ ∑ ∑ ? = ? = ? = ? = ? = Θ+??= Θ+???= Θ+??= ? ? ? ? ? ? Θ+??= 1 0 1 0 1 0 1 0 1 0 )( 1 })1,(max{ 1 )(})1,(max{ )(})1,(max{ )(})1,(max{)]([ n k n k n k k n k k n k k n n knkTE n nknkTEXE nknkTXE nknkTXEnTE Linearity of expectation; E[X k ] = 1/n . Introduction to Algorithms Day 9 L6.12? 2001 by Charles E. Leiserson Calculating expectation () [] [][ ] [] [] ? ? )()( 2 )( 1 })1,(max{ 1 )(})1,(max{ )(})1,(max{ )(})1,(max{)]([ 1 2/ 1 0 1 0 1 0 1 0 1 0 nkTE n n n knkTE n nknkTEXE nknkTXE nknkTXEnTE n nk n k n k n k k n k k n k k Θ+≤ Θ+??= Θ+???= Θ+??= ? ? ? ? ? ? Θ+??= ∑ ∑∑ ∑ ∑ ∑ ? = ? = ? = ? = ? = ? = Upper terms appear twice. Introduction to Algorithms Day 9 L6.13? 2001 by Charles E. Leiserson Hairy recurrence [] ? ? )()( 2 )]([ 1 2/ nkTE n nTE n nk Θ+= ∑ ? = Prove: E[T(n)] ≤ cn for constant c > 0 . Use fact: ? ? 2 1 2/ 8 3 nk n nk ∑ ? = ≤ (exercise). ? The constant c can be chosen large enough so that E[T(n)] ≤ cn for the base cases. (But not quite as hairy as the quicksort one.) Introduction to Algorithms Day 9 L6.14? 2001 by Charles E. Leiserson Substitution method [] ? ? )( 2 )( 1 2/ nck n nTE n nk Θ+≤ ∑ ? = Substitute inductive hypothesis. Introduction to Algorithms Day 9 L6.15? 2001 by Charles E. Leiserson Substitution method [] ?? )( 8 32 )( 2 )( 2 1 2/ nn n c nck n nTE n nk Θ+ ? ? ? ? ? ? ≤ Θ+≤ ∑ ? = Use fact. Introduction to Algorithms Day 9 L6.16? 2001 by Charles E. Leiserson Substitution method Express as desired – residual. [] ?? ? ? ? ? ? ? Θ??= Θ+ ? ? ? ? ? ? ≤ Θ+≤ ∑ ? = )( 4 )( 8 32 )( 2 )( 2 1 2/ n cn cn nn n c nck n nTE n nk Introduction to Algorithms Day 9 L6.17? 2001 by Charles E. Leiserson Substitution method [] ?? cn n cn cn nn n c nck n nTE n nk ≤ ? ? ? ? ? ? Θ??= Θ+ ? ? ? ? ? ? ≤ Θ+≤ ∑ ? = )( 4 )( 8 32 )( 2 )( 2 1 2/ if c is chosen large enough so that cn/4 dominates the Θ(n). , Introduction to Algorithms Day 9 L6.18? 2001 by Charles E. Leiserson Summary of randomized order-statistic selection ? Works fast: linear expected time. ? Excellent algorithm in practice. ? But, the worst case is very bad: Θ(n 2 ). Q. Is there an algorithm that runs in linear time in the worst case? IDEA: Generate a good pivot recursively. A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. Introduction to Algorithms Day 9 L6.19? 2001 by Charles E. Leiserson Worst-case linear-time order statistics if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part SELECT(i, n) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively SELECT the median x of the ?n/5? group medians to be the pivot. 3. Partition around the pivot x. Let k = rank(x). 4. Same as RAND- SELECT Introduction to Algorithms Day 9 L6.20? 2001 by Charles E. Leiserson Choosing the pivot Introduction to Algorithms Day 9 L6.21? 2001 by Charles E. Leiserson Choosing the pivot 1. Divide the n elements into groups of 5. Introduction to Algorithms Day 9 L6.22? 2001 by Charles E. Leiserson Choosing the pivot lesser greater 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. Introduction to Algorithms Day 9 L6.23? 2001 by Charles E. Leiserson Choosing the pivot lesser greater 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively SELECT the median x of the ?n/5? group medians to be the pivot. x Introduction to Algorithms Day 9 L6.24? 2001 by Charles E. Leiserson Analysis lesser greater x At least half the group medians are ≤ x, which is at least ??n/5? /2? = ?n/10? group medians. Introduction to Algorithms Day 9 L6.25? 2001 by Charles E. Leiserson Analysis lesser greater x At least half the group medians are ≤ x, which is at least ??n/5? /2? = ?n/10? group medians. ? Therefore, at least 3?n/10? elements are ≤ x. (Assume all elements are distinct.) Introduction to Algorithms Day 9 L6.26? 2001 by Charles E. Leiserson Analysis lesser greater x At least half the group medians are ≤ x, which is at least ??n/5? /2? = ?n/10? group medians. ? Therefore, at least 3?n/10? elements are ≤ x. ? Similarly, at least 3?n/10? elements are ≥ x. (Assume all elements are distinct.) Introduction to Algorithms Day 9 L6.27? 2001 by Charles E. Leiserson Minor simplification ? For n ≥ 50, we have 3?n/10?≥n/4. ? Therefore, for n ≥ 50 the recursive call to SELECT in Step 4 is executed recursively on ≤ 3n/4 elements. ? Thus, the recurrence for running time can assume that Step 4 takes time T(3n/4) in the worst case. ? For n < 50, we know that the worst-case time is T(n) = Θ(1). Introduction to Algorithms Day 9 L6.28? 2001 by Charles E. Leiserson Developing the recurrence if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part SELECT(i, n) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively SELECT the median x of the ?n/5? group medians to be the pivot. 3. Partition around the pivot x. Let k = rank(x). 4. T(n) Θ(n) T(n/5) Θ(n) T(3n/4) Introduction to Algorithms Day 9 L6.29? 2001 by Charles E. Leiserson Solving the recurrence )( 4 3 5 1 )( nnTnTnT Θ+ ? ? ? ? ? ? + ? ? ? ? ? ? = if c is chosen large enough to handle both the Θ(n) and the initial conditions. cn ncncn ncn ncncnnT ≤ ? ? ? ? ? ? Θ??= Θ+= Θ++≤ )( 20 1 )( 20 19 )( 4 3 5 1 )( , Substitution: T(n) ≤ cn Introduction to Algorithms Day 9 L6.30? 2001 by Charles E. Leiserson Conclusions ? Since the work at each level of recursion is a constant fraction (19/20) smaller, the work per level is a geometric series dominated by the linear work at the root. ? In practice, this algorithm runs slowly, because the constant in front of n is large. ? The randomized algorithm is far more practical. Exercise: Why not divide into groups of 3?