7-3 Alternative quicksort analysis
An alternative analysis of the running time of randomized quicksort focuses on the expected running time of each individual recursive call to RANDOMIZED-QUICKSORT, rather than on the number of comparisons performed.
a. Argue that, given an array of size n, the probability that any particular element is chosen as the pivot is 1/n. Use this to define indicator random variables
Xi=I{ith smallest element is chosen as the pivot}.
What is E[Xi]?
b. Let T(n) be a random variable denoting the running time of quicksort on an array of size n. Argue that
E[T(n)]=E[q=1∑nXq(T(q−1)+T(n−q)+Θ(n))].(7.5)
c. Show that we can rewrite equation (7.5) as
E[T(n)]=n2q=2∑n−1E[T(q)]+Θ(n).(7.6)
d. Show that
k=2∑n−1klgk≤21n2lgn−81n2.(7.7)
(Hint: Split the summation into two parts, one for k=2,3,…,⌈n/2⌉−1 and one for k=⌈n/2⌉,…,n−1.)
e. Using the bound from equation (7.7), show that the recurrence in equation (7.6) has the solution E[T(n)]=Θ(nlgn). (Hint: Show, by substitution, that E[T(n)]≤anlgn for sufficiently large n and for some positive constant a.)
a. Since the pivot is selected as a random element in the array, which has size n, the probabilities of any particular element being selected are all equal, and add to one, so, are all n1. As such, E[Xi]=Pr{i smallest is picked}=n1.
b. We can apply linearity of expectation over all of the events Xi. Suppose we have a particular Xi be true, then, we will have one of the sub arrays be length i−1, and the other be n−i, and will of course still need linear time to run the partition procedure. This corresponds exactly to the summand in equation (7.5).
c.
E[q=1∑nXq(T(q−1)+T(n−q)+Θ(n))]=q=1∑nE[Xq(T(q−1)+T(n−q)+Θ(n))]=q=1∑n(T(q−1)+T(n−q)+Θ(n))/n=Θ(n)+n1q=1∑n(T(q−1)+T(n−1))=Θ(n)+n1(q=1∑nT(q−1)+q=1∑nT(n−q))=Θ(n)+n1(q=1∑nT(q−1)+q=1∑nT(q−1))=Θ(n)+n2q=1∑nT(q−1)=Θ(n)+n2q=0∑n−1T(q)=Θ(n)+n2q=2∑n−1T(q).
d. We will prove this inequality in a different way than suggested by the hint. If we let f(k)=klgk treated as a continuous function, then f′(k)=lgk+1. Note now that the summation written out is the left hand approximation of the integral of f(k) from 2 to n with step size 1. By integration by parts, the anti-derivative of klgk is
lg21(2k2lnk−4k2).
So, plugging in the bounds and subtracting, we get 2n2lgn−4ln2n2−1. Since f has a positive derivative over the entire interval that the integral is being evaluated over, the left hand rule provides a underapproximation of the integral, so, we have that
k=2∑n−1klgk≤2n2lgn−4ln2n2−1≤2n2lgn−8n2,
where the last inequality uses the fact that ln2>1/2.
e. Assume by induction that T(q)≤qlg(q)+Θ(n). Combining (7.6) and (7.7), we have
E[T(n)]=n2q=2∑n−1E[T(q)]+Θ(n)≤n2q=2∑n−1(qlgq+Θ(n))+Θ(n)≤n2q=2∑n−1qlgq+n2Θ(n)+Θ(n)≤n2(21n2lgn−81n2)+Θ(n)=nlgn−41n+Θ(n)=nlgn+Θ(n).