10-3 Searching a sorted compact list
Exercise 10.3-4 asked how we might maintain an $n$-element list compactly in the first $n$ positions of an array. We shall assume that all keys are distinct and that the compact list is also sorted, that is, $key[i] < key[next[i]]$ for all $i = 1, 2, \ldots, n$ such that $next[i] \ne \text{NIL}$. We will also assume that we have a variable $L$ that contains the index of the first element on the list. Under these assumptions, you will show that we can use the following randomized algorithm to search the list in $O(\sqrt n)$ expected time.
cpp COMPACT-LIST-SEARCH(L, n, k) i = L while i != NIL and key[i] < k j = RANDOM(1, n) if key[i] < key[j] and key[j] ≤ k i = j if key[i] == k return i i = next[i] if i == NIL or key[i] > k return NIL else return i
If we ignore lines 3–7 of the procedure, we have an ordinary algorithm for searching a sorted linked list, in which index $i$ points to each position of the list in turn. The search terminates once the index $i$ "falls off" the end of the list or once $key[i] \ge k$. In the latter case, if $key[i] = k$, clearly we have found a key with the value $k$. If, however, $key[i] > k$, then we will never find a key with the value $k$, and so terminating the search was the right thing to do.
Lines 3–7 attempt to skip ahead to a randomly chosen position $j$. Such a skip benefits us if $key[j]$ is larger than $key[i]$ and no larger than $k$; in such a case, $j$ marks a position in the list that $i$ would have to reach during an ordinary list search. Because the list is compact, we know that any choice of $j$ between $1$ and $n$ indexes some object in the list rather than a slot on the free list.
Instead of analyzing the performance of $\text{COMPACT-LIST-SEARCH}$ directly, we shall analyze a related algorithm, $\text{COMPACT-LIST-SEARCH}'$, which executes two separate loops. This algorithm takes an additional parameter $t$ which determines an upper bound on the number of iterations of the first loop.
cpp COMPACT-LIST-SEARCH'(L, n, k, t) i = L for q = 1 to t j = RANDOM(1, n) if key[i] < key[j] and key[j] ≤ k i = j if key[i] == k return i while i != NIL and key[i] < k i = next[i] if i == NIL or key[i] > k return NIL else return i
To compare the execution of the algorithms $\text{COMPACT-LIST-SEARCH}(L, n, k)$ and $\text{COMPACT-LIST-SEARCH}'(L, n, k, t)$, assume that the sequence of integers returned by the calls of $\text{RANDOM}(1, n)$ is the same for both algorithms.
a. Suppose that $\text{COMPACT-LIST-SEARCH}(L, n, k)$ takes $t$ iterations of the while loop of lines 2–8. Argue that $\text{COMPACT-LIST-SEARCH}'(L, n, k, t)$ returns the same answer and that the total number of iterations of both the for and while loops within $\text{COMPACT-LIST-SEARCH}'$ is at least $t$.
In the call $\text{COMPACT-LIST-SEARCH}'(L, n, k, t)$, let $X_t$ be the random variable that describes the distance in the linked list (that is, through the chain of $next$ pointers) from position $i$ to the desired key $k$ after $t$ iterations of the for loop of lines 2–7 have occurred.
b. Argue that the expected running time of $\text{COMPACT-LIST-SEARCH}'(L, n, k, t)$ is $O(t + \text E[X_t])$.
c. Show that $\text E[X_t] \le \sum_{r = 1}^n (1 - r / n)^t$. ($\textit{Hint:}$ Use equation $\text{(C.25)}$.)
d. Show that $\sum_{r = 0}^{n - 1} r^t \le n^{t + 1} / (t + 1)$.
e. Prove that $\text E[X_t] \le n / (t + 1)$.
f. Show that $\text{COMPACT-LIST-SEARCH}'(L, n, k, t)$ runs in $O(t + n / t)$ expected time.
g. Conclude that $\text{COMPACT-LIST-SEARCH}$ runs in $O(\sqrt n)$ expected time.
h. Why do we assume that all keys are distinct in $\text{COMPACT-LIST-SEARCH}$? Argue that random skips do not necessarily help asymptotically when the list contains repeated key values.
a. If the original version of the algorithm takes only $t$ iterations, then, we have that it was only at most t random skips though the list to get to the desired value, since each iteration of the original while loop is a possible random jump followed by a normal step through the linked list.
b. The for loop on lines 2–7 will get run exactly $t$ times, each of which is constant runtime. After that, the while loop on lines 8–9 will be run exactly $X_t$ times. So, the total runtime is $O(t + \text E[X_t])$.
c. Using equation $\text{C.25}$, we have that $\text E[X_t] = \sum_{i = 1}^\infty \Pr\{X_t \ge i\}$. So, we need to show that $\Pr\{X_t \ge i\} \le (1 - i / n)^t$. This can be seen because having $X_t$ being greater than $i$ means that each random choice will result in an element that is either at least $i$ steps before the desired element, or is after the desired element. There are $n - i$ such elements, out of the total $n$ elements that we were pricking from. So, for a single one of the choices to be from such a range, we have a probability of $(n - i) / n = (1 - i / n)$. Since each of the selections was independent, the total probability that all of them were is $(1 - i / n)^t$, as desired. Lastly, we can note that since the linked list has length $n$, the probability that $X_t$ is greater than $n$ is equal to zero.
d. Since we have that $t > 0$, we know that the function $f(x) = x^t$ is increasing, so, that means that $\lfloor x \rfloor^t \le f(x)$. So,
$$\sum_{r = 0}^{n - 1} r^t = \int_0^n \lfloor r \rfloor^t dr \le \int_0^n f(r)dr = \frac{n^{t + 1}}{t + 1}.$$
e.
$$ \begin{aligned} \text E[X_t] & \le \sum_{r = 1}^n (1 - r / n)^t & \text{from part (c)} \\ & = \sum_{r = 1}^n \frac{(n - r)^t}{n^t} \\ & = \frac{1}{n^t} \sum_{r = 1}^n (n - r)^t, \end{aligned} $$
and
$$ \begin{aligned} \sum_{r = 1}^n (n - r)^t & = (n - 1)^t + (n - 2)^t + \cdots + 1^t + 0^t \\ & = \sum_{r = 0}^{n - 1} r^t. \end{aligned} $$
So,
$$ \begin{aligned} \text E[X_t] & = \frac{1}{n^t} \sum_{r = 0}^{n - 1} r^t \\ & \le \frac{1}{n^t} \cdot \frac{n^{t + 1}}{t + 1} & \text{from part (d)} \\ & = \frac{n}{t + 1}. \end{aligned} $$
f. We just put together parts (b) and (e) to get that it runs in time $O(t + n / (t + 1))$. But, this is the same as $O(t + n / t)$.
g. Since we have that for any number of iterations $t$ that the first algorithm takes to find its answer, the second algorithm will return it in time $O(t + n / t)$. In particular, if we just have that $t = \sqrt n$. The second algorithm takes time only $O(\sqrt n)$. This means that tihe first list search algorithm is $O(\sqrt n)$ as well.
h. If we don't have distinct key values, then, we may randomly select an element that is further along than we had been before, but not jump to it because it has the same key as what we were currently at. The analysis will break when we try to bound the probability that $X_t \ge i$.