8-1 Probabilistic lower bounds on comparison sorting
In this problem, we prove a probabilistic lower bound on the running time of any deterministic or randomized comparison sort on distinct input elements. We begin by examining a deterministic comparison sort with decision tree . We assume that every permutation of 's inputs is equally likely.
a. Suppose that each leaf of is labeled with the probability that it is reached given a random input. Prove that exactly leaves are labeled and that the rest are labeled .
b. Let denote the external path length of a decision tree ; that is, is the sum of the depths of all the leaves of . Let be a decision tree with leaves, and let and be the left and right subtrees of . Show that .
c. Let be the minimum value of over all decision trees with leaves. Show that . ( Consider a decision tree with leaves that achieves the minimum. Let be the number of leaves in and the number of leaves in .)
d. Prove that for a given value of and in the range , the function is minimized at . Conclude that .
e. Prove that , and conclude that the average-case time to sort elements is .
Now, consider a randomized comparison sort . We can extend the decision-tree model to handle randomization by incorporating two kinds of nodes: ordinary comparison nodes and "randomization" nodes. A randomization node models a random choice of the form made by algorithm ; the node has children, each of which is equally likely to be chosen during an execution of the algorithm.
f. Show that for any randomized comparison sort , there exists a deterministic comparison sort whose expected number of comparisons is no more than those made by .
a. There are possible permutations of the input array because the input elements are all distinct. Since each is equally likely, the distribution is uniformly supported on this set. So, each occurs with probability and corresponds to a different leaf because the program needs to be able to distinguish between them.
b. The depths of particular elements of or are all one less than their depths when considered elements of . In particular, this is true for the leaves of the two subtrees. Also, form a partition of all the leaves of . Therefore, if we let denote the leaves of ,
c. Suppose we have a with leaves so that . Let be the number of leaves in . Then, . However, we can pick and to minimize the external path length.
d. We treat as a continuous variable, and take a derivative to find critical points. The given expression has the following as a derivative with respect to
which is when we have . Therefore, , .
Since we are picking the two subtrees to be roughly equal size, the total depth will be order , with each level contributing , so the total external path length is at least .
e. Since before we that a tree with leaves needs to have external length , and that a sorting tree needs at least trees, a sorting tree must have external tree length at least . Since the average case run time is the depth of a leaf weighted by the probability of that leaf being the one that occurs, we have that the run time is at least .
f. Since the expected runtime is the average over all possible results from the random bits, if every possible fixing of the randomness resulted in a higher runtime, the average would have to be higher as well.