7-6 Fuzzy sorting of intervals

Consider the problem in which we do not know the numbers exactly. Instead, for each number, we know an interval on the real line to which it belongs. That is, we are given $n$ closed intervals of the form $[a_i, b_i]$ , where $a_i \le b_i$ . We wish to fuzzy-sort these intervals, i.e., to produce a permutation $\langle i_1, i_2, \ldots, i_n \rangle$ of the intervals such that for $j = 1, 2, \ldots, n$ , there exists $c_j \in [a_{i_j}, b_{i_j}]$ satisfying $c_1 \le c_2 \le \cdots \le c_n$ .

a. Design a randomized algorithm for fuzzy-sorting $n$ intervals. Your algorithm should have the general structure of an algorithm that quicksorts the left endpoints (the $a_i$ values), but it should take advantage of overlapping intervals to improve the running time. (As the intervals overlap more and more, the problem of fuzzy-sorting the intervals becoes progressively easier. Your algorithm should take advantage of such overlapping, to the extend that it exists.)

b. Argue that your algorithm runs in expected time $\Theta(n\lg n)$ in general, but runs in expected time $\Theta(n)$ when all of the intervals overlap (i.e., when there exists a value $x$ such that $x \in [a_i, b_i]$ for all $i$ ). Your algorithm should not be checking for this case explicitly; rather, its performance should naturally improve as the amount of overlap increases.

a. With randomly selected left endpoint for the pivot, we could trivially perform fuzzy sorting by quicksorting the left endpoints, $a_i$ 's. This would achieve the worst-case expected running time of $\Theta(n\lg n)$ . We definitely can do better by exploit the characteristic that we don't have to sort overlapping intervals. That is, for two overlapping intervals, $[a_i, b_i]$ and $[a_j, b_j]$ . In such situations, we can always choose $\{c_i, c_j\}$ (within the intersection of these intervals) such that $c_i \le c_j$ or $c_j \le c_i$ .

Since overlapping intervals do not require sorting, we can improve the expected running time by modifying quicksort to identify overlaps:

FIND-INTERSECTION(A, p, r)
    rand = RANDOM(p, r)
    exchange A[rand] with A[r]
    a = A[r].a
    b = A[r].b
    for i = p to r - 1
        if A[i].a ≤ b and A[i].b ≥ a
            if A[i].a > a
                a = A[i].a
            if A[i].b < b
                b = A[i].b
    return (a, b)

On lines 2 through 3 of $\text{FIND-INTERSECTION}$ , we select a random pivot interval as the initial region of overlap $[a ,b]$ . There are two situations:

If the intervals are all disjoint, then the estimated region of overlap will be this randomly-selected interval;
otherwise, on lines 6 through 11, we loop through all intervals in arrays $A$ (except the endpoint which is the initial pivot interval). At each iteration, we determine if the current interval overlaps the current estimated region of overlap. If it does, we update the estimated region of overlap as $[a, b] = [a_i, b_i] \cap [a, b]$ .

$\text{FIND-INTERSECTION}$ has a worst-case running time $\Theta(n)$ since we evaluate the intersection from index $1$ to $A.length$ of the array.

We can extend the $\text{QUICKSORT}$ to allow fuzzy sorting using $\text{FIND-INTERSECTION}$ .

First, partition the input array into "left", "middle", and "right" subarrays. The "middle" subarray elements overlap the interval $[a, b]$ found by $\text{FIND-INTERSECTION}$ . As a result, they can appear in any order in the output.

We recursively call $\text{FUZZY-SORT}$ on the "left" and "right" subarrays to produce a fuzzy sorted array in-place. The following pseudocode implements these basic operations. One can run $\text{FUZZY-SORT}(A, 1, A.length)$ to fuzzy-sort an array.

The first and last elements in a subarray are indexed by $p$ and $r$ , respectively. The index of the first and last intervals in the "middle" region are indexed by $q$ and $t$ , respectively.

FUZZY-SORT(A, p, r)
    if p < r
        (a, b) = FIND-INTERSECTION(A, p, r)
        t = PARTITION-RIGHT(A, a, p, r)
        q = PARTITION-LEFT(A, b, p, t)
        FUZZY-SORT(A, p, q - 1)
        FUZZY-SORT(A, t + 1, r)

We need to determine how to partition the input arrays into "left", "middle", and "right" subarrays in-place.

First, we $\text{PARTITION-RIGHT}$ the entire array from $p$ to $r$ using a pivot value equal to the left endpoint $a$ found by $\text{FIND-INTERSECTION}$ , such that $a_i \le a$ .

Then, we $\text{PARTITION-LEFT}$ the subarray from $p$ to $t$ using a pivot value equal to the right endpoint $b$ found by $\text{FIND-INTERSECTION}$ , such that $b_i < b$ .

PARTITION-RIGHT(A, a, p, r)
    i = p - 1
    for j = p to r - 1
        if A[j].a ≤ a
            i = i + 1
            exchange A[i] with A[j]
    exchange A[i + 1] with A[r]
    return i + 1

PARTITION-LEFT(A, b, p, t)
    i = p - 1
    for j = p to t - 1
        if A[j].b < b
            i = i + 1
            exchange A[i] with A[j]
    exchange A[i + 1] with A[t]
    return i + 1

The $\text{FUZZY-SORT}$ is similar to the randomized quicksort presented in the textbook. In fact, $\text{PARTITION-RIGHT}$ and $\text{PARTITION-LEFT}$ are nearly identical to the $\text{PARTITION}$ procedure on page 171. The primary difference is the value of the pivot used to sort the intervals.

b. We expect $\text{FUZZY-SORT}$ to have a worst-case running time $\Theta(n\lg n)$ for a set of input intervals which do not overlap each other. First, notice that lines 2 through 3 of $\text{FIND-INTERSECTION}$ select a random interval as the initial pivot interval. Recall that if the intervals are disjoint, then $[a, b]$ will simply be this initial interval.

Since for this example there are no overlaps, the "middle" region created by lines 4 and 5 of $\text{FUZZY-SORT}$ will only contain the initially-selected interval. In general, line 3 is $\Theta(n)$ . Fortunately, since the pivot interval $[a, b]$ is randomly-selected, the expected sizes of the "left" and "right" subarrays are both $\left\lfloor \frac{n}{2} \right\rfloor$ . In conclusion, the reccurrence of the running time is

$\begin{aligned} T(n) & \le 2T(n / 2) + \Theta(n) \\ & = \Theta(n\lg n). \end{aligned}$

The $\text{FIND-INTERSECTION}$ will always return a non-empty region of overlap $[a, b]$ containing $x$ if the intervals all overlap at $x$ . For this situation, every interval will be within the "middle" region since the "left" and "right" subarrays will be empty, lines 6 and 7 of $\text{FUZZY-SORT}$ are $\Theta(1)$ . As a result, there is no recursion and the running time of $\text{FUZZY-SORT}$ is determined by the $\Theta(n)$ running time required to find the region of overlap. Therfore, if the input intervals all overlap at a point, then the expected worst-case running time is $\Theta(n)$ .