7-3 Alternative quicksort analysis

An alternative analysis of the running time of randomized quicksort focuses on the expected running time of each individual recursive call to $\text{RANDOMIZED-QUICKSORT}$ , rather than on the number of comparisons performed.

a. Argue that, given an array of size $n$ , the probability that any particular element is chosen as the pivot is $1 / n$ . Use this to define indicator random variables

$X_i = I\{i\text{th smallest element is chosen as the pivot}\}.$

What is $\text E[X_i]$ ?

b. Let $T(n)$ be a random variable denoting the running time of quicksort on an array of size $n$ . Argue that

$\text E[T(n)] = \text E\bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n))\bigg]. \tag{7.5}$

c. Show that we can rewrite equation $\text{(7.5)}$ as

$\text E[T(n)] = \frac{2}{n}\sum_{q = 2}^{n - 1}\text E[T(q)] + \Theta(n). \tag{7.6}$

d. Show that

$\sum_{k = 2}^{n - 1}k\lg k \le \frac{1}{2}n^2\lg n - \frac{1}{8}n^2. \tag{7.7}$

( $\textit{Hint:}$ Split the summation into two parts, one for $k = 2, 3, \ldots, \lceil n / 2 \rceil - 1$ and one for $k = \lceil n / 2 \rceil, \ldots, n - 1$ .)

e. Using the bound from equation $\text{(7.7)}$ , show that the recurrence in equation $\text{(7.6)}$ has the solution $\text E[T(n)] = \Theta(n\lg n)$ . ( $\textit{Hint:}$ Show, by substitution, that $\text E[T(n)] \le an\lg n$ for sufficiently large $n$ and for some positive constant $a$ .)

a. Since the pivot is selected as a random element in the array, which has size $n$ , the probabilities of any particular element being selected are all equal, and add to one, so, are all $\frac{1}{n}$ . As such, $\text E[X_i] = \Pr\{i \text{ smallest is picked}\} = \frac{1}{n}$ .

b. We can apply linearity of expectation over all of the events $X_i$ . Suppose we have a particular $X_i$ be true, then, we will have one of the sub arrays be length $i - 1$ , and the other be $n - i$ , and will of course still need linear time to run the partition procedure. This corresponds exactly to the summand in equation $\text{(7.5)}$ .

c.

$\begin{aligned} & \text E\Bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n)) \Bigg] \\ & = \sum_{q = 1}^n \text E[X_q(T(q - 1) + T(n - q) + \Theta(n))] \\ & = \sum_{q = 1}^n(T(q - 1) + T(n - q) + \Theta(n))/n \\ & = \Theta(n) + \frac{1}{n} \sum_{q = 1}^n(T(q - 1)+T(n - 1)) \\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (n - q) \Big) \\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (q - 1) \Big) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 1}^n T(q - 1) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 0}^{n - 1} T(q) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 2}^{n - 1} T(q). \end{aligned}$

d. We will prove this inequality in a different way than suggested by the hint. If we let $f(k) = k\lg k$ treated as a continuous function, then $f'(k) = \lg k + 1$ . Note now that the summation written out is the left hand approximation of the integral of $f(k)$ from $2$ to $n$ with step size $1$ . By integration by parts, the anti-derivative of $k\lg k$ is

$\frac{1}{\lg 2}(\frac{k^2}{2}\ln k-\frac{k^2}{4}).$

So, plugging in the bounds and subtracting, we get $\frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1$ . Since $f$ has a positive derivative over the entire interval that the integral is being evaluated over, the left hand rule provides a underapproximation of the integral, so, we have that

$\begin{aligned} \sum_{k = 2}^{n - 1} k\lg k & \le \frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1 \\ & \le \frac{n^2\lg n}{2} - \frac{n^2}{8}, \end{aligned}$

where the last inequality uses the fact that $\ln 2 > 1 / 2$ .

e. Assume by induction that $T(q) \le q \lg(q) + \Theta(n)$ . Combining $\text{(7.6)}$ and $\text{(7.7)}$ , we have

$\begin{aligned} \text E[T(n)] & = \frac{2}{n} \sum_{q = 2}^{n - 1} \text E[T(q)] + \Theta(n) \\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}(q\lg q + \Theta(n)) + \Theta(n) \\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}q\lg q + \frac{2}{n}\Theta(n) + \Theta(n) \\ & \le \frac{2}{n}(\frac{1}{2}n^2\lg n - \frac{1}{8}n^2) + \Theta(n) \\ & = n\lg n -\frac{1}{4}n + \Theta(n) \\ & = n\lg n+\Theta(n). \end{aligned}$