7-3 Alternative quicksort analysis

An alternative analysis of the running time of randomized quicksort focuses on the expected running time of each individual recursive call to RANDOMIZED-QUICKSORT\text{RANDOMIZED-QUICKSORT}, rather than on the number of comparisons performed.

a. Argue that, given an array of size nn, the probability that any particular element is chosen as the pivot is 1/n1 / n. Use this to define indicator random variables

Xi=I{ith smallest element is chosen as the pivot}.X_i = I\{i\text{th smallest element is chosen as the pivot}\}.

What is E[Xi]\text E[X_i]?

b. Let T(n)T(n) be a random variable denoting the running time of quicksort on an array of size nn. Argue that

E[T(n)]=E[q=1nXq(T(q1)+T(nq)+Θ(n))].(7.5)\text E[T(n)] = \text E\bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n))\bigg]. \tag{7.5}

c. Show that we can rewrite equation (7.5)\text{(7.5)} as

E[T(n)]=2nq=2n1E[T(q)]+Θ(n).(7.6)\text E[T(n)] = \frac{2}{n}\sum_{q = 2}^{n - 1}\text E[T(q)] + \Theta(n). \tag{7.6}

d. Show that

k=2n1klgk12n2lgn18n2.(7.7)\sum_{k = 2}^{n - 1}k\lg k \le \frac{1}{2}n^2\lg n - \frac{1}{8}n^2. \tag{7.7}

(Hint:\textit{Hint:} Split the summation into two parts, one for k=2,3,,n/21k = 2, 3, \ldots, \lceil n / 2 \rceil - 1 and one for k=n/2,,n1k = \lceil n / 2 \rceil, \ldots, n - 1.)

e. Using the bound from equation (7.7)\text{(7.7)}, show that the recurrence in equation (7.6)\text{(7.6)} has the solution E[T(n)]=Θ(nlgn)\text E[T(n)] = \Theta(n\lg n). (Hint:\textit{Hint:} Show, by substitution, that E[T(n)]anlgn\text E[T(n)] \le an\lg n for sufficiently large nn and for some positive constant aa.)

a. Since the pivot is selected as a random element in the array, which has size nn, the probabilities of any particular element being selected are all equal, and add to one, so, are all 1n\frac{1}{n}. As such, E[Xi]=Pr{i smallest is picked}=1n\text E[X_i] = \Pr\{i \text{ smallest is picked}\} = \frac{1}{n}.

b. We can apply linearity of expectation over all of the events XiX_i. Suppose we have a particular XiX_i be true, then, we will have one of the sub arrays be length i1i - 1, and the other be nin - i, and will of course still need linear time to run the partition procedure. This corresponds exactly to the summand in equation (7.5)\text{(7.5)}.

c.

E[q=1nXq(T(q1)+T(nq)+Θ(n))]=q=1nE[Xq(T(q1)+T(nq)+Θ(n))]=q=1n(T(q1)+T(nq)+Θ(n))/n=Θ(n)+1nq=1n(T(q1)+T(n1))=Θ(n)+1n(q=1nT(q1)+q=1nT(nq))=Θ(n)+1n(q=1nT(q1)+q=1nT(q1))=Θ(n)+2nq=1nT(q1)=Θ(n)+2nq=0n1T(q)=Θ(n)+2nq=2n1T(q). \begin{aligned} & \text E\Bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n)) \Bigg] \\ & = \sum_{q = 1}^n \text E[X_q(T(q - 1) + T(n - q) + \Theta(n))] \\ & = \sum_{q = 1}^n(T(q - 1) + T(n - q) + \Theta(n))/n \\ & = \Theta(n) + \frac{1}{n} \sum_{q = 1}^n(T(q - 1)+T(n - 1)) \\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (n - q) \Big) \\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (q - 1) \Big) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 1}^n T(q - 1) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 0}^{n - 1} T(q) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 2}^{n - 1} T(q). \end{aligned}

d. We will prove this inequality in a different way than suggested by the hint. If we let f(k)=klgkf(k) = k\lg k treated as a continuous function, then f(k)=lgk+1f'(k) = \lg k + 1. Note now that the summation written out is the left hand approximation of the integral of f(k)f(k) from 22 to nn with step size 11. By integration by parts, the anti-derivative of klgkk\lg k is

1lg2(k22lnkk24).\frac{1}{\lg 2}(\frac{k^2}{2}\ln k-\frac{k^2}{4}).

So, plugging in the bounds and subtracting, we get n2lgn2n24ln21\frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1. Since ff has a positive derivative over the entire interval that the integral is being evaluated over, the left hand rule provides a underapproximation of the integral, so, we have that

k=2n1klgkn2lgn2n24ln21n2lgn2n28, \begin{aligned} \sum_{k = 2}^{n - 1} k\lg k & \le \frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1 \\ & \le \frac{n^2\lg n}{2} - \frac{n^2}{8}, \end{aligned}

where the last inequality uses the fact that ln2>1/2\ln 2 > 1 / 2.

e. Assume by induction that T(q)qlg(q)+Θ(n)T(q) \le q \lg(q) + \Theta(n). Combining (7.6)\text{(7.6)} and (7.7)\text{(7.7)}, we have

E[T(n)]=2nq=2n1E[T(q)]+Θ(n)2nq=2n1(qlgq+Θ(n))+Θ(n)2nq=2n1qlgq+2nΘ(n)+Θ(n)2n(12n2lgn18n2)+Θ(n)=nlgn14n+Θ(n)=nlgn+Θ(n). \begin{aligned} \text E[T(n)] & = \frac{2}{n} \sum_{q = 2}^{n - 1} \text E[T(q)] + \Theta(n) \\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}(q\lg q + \Theta(n)) + \Theta(n) \\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}q\lg q + \frac{2}{n}\Theta(n) + \Theta(n) \\ & \le \frac{2}{n}(\frac{1}{2}n^2\lg n - \frac{1}{8}n^2) + \Theta(n) \\ & = n\lg n -\frac{1}{4}n + \Theta(n) \\ & = n\lg n+\Theta(n). \end{aligned}