7-6 Fuzzy sorting of intervals

Consider the problem in which we do not know the numbers exactly. Instead, for each number, we know an interval on the real line to which it belongs. That is, we are given nn closed intervals of the form [ai,bi][a_i, b_i], where aibia_i \le b_i. We wish to fuzzy-sort these intervals, i.e., to produce a permutation i1,i2,,in\langle i_1, i_2, \ldots, i_n \rangle of the intervals such that for j=1,2,,nj = 1, 2, \ldots, n, there exists cj[aij,bij]c_j \in [a_{i_j}, b_{i_j}] satisfying c1c2cnc_1 \le c_2 \le \cdots \le c_n.

a. Design a randomized algorithm for fuzzy-sorting nn intervals. Your algorithm should have the general structure of an algorithm that quicksorts the left endpoints (the aia_i values), but it should take advantage of overlapping intervals to improve the running time. (As the intervals overlap more and more, the problem of fuzzy-sorting the intervals becoes progressively easier. Your algorithm should take advantage of such overlapping, to the extend that it exists.)

b. Argue that your algorithm runs in expected time Θ(nlgn)\Theta(n\lg n) in general, but runs in expected time Θ(n)\Theta(n) when all of the intervals overlap (i.e., when there exists a value xx such that x[ai,bi]x \in [a_i, b_i] for all ii). Your algorithm should not be checking for this case explicitly; rather, its performance should naturally improve as the amount of overlap increases.

a. With randomly selected left endpoint for the pivot, we could trivially perform fuzzy sorting by quicksorting the left endpoints, aia_i's. This would achieve the worst-case expected running time of Θ(nlgn)\Theta(n\lg n). We definitely can do better by exploit the characteristic that we don't have to sort overlapping intervals. That is, for two overlapping intervals, [ai,bi][a_i, b_i] and [aj,bj][a_j, b_j]. In such situations, we can always choose {ci,cj}\{c_i, c_j\} (within the intersection of these intervals) such that cicjc_i \le c_j or cjcic_j \le c_i.

Since overlapping intervals do not require sorting, we can improve the expected running time by modifying quicksort to identify overlaps:

FIND-INTERSECTION(A, p, r)
    rand = RANDOM(p, r)
    exchange A[rand] with A[r]
    a = A[r].a
    b = A[r].b
    for i = p to r - 1
        if A[i].a ≤ b and A[i].b ≥ a
            if A[i].a > a
                a = A[i].a
            if A[i].b < b
                b = A[i].b
    return (a, b)

On lines 2 through 3 of FIND-INTERSECTION\text{FIND-INTERSECTION}, we select a random pivot interval as the initial region of overlap [a,b][a ,b]. There are two situations:

  • If the intervals are all disjoint, then the estimated region of overlap will be this randomly-selected interval;
  • otherwise, on lines 6 through 11, we loop through all intervals in arrays AA (except the endpoint which is the initial pivot interval). At each iteration, we determine if the current interval overlaps the current estimated region of overlap. If it does, we update the estimated region of overlap as [a,b]=[ai,bi][a,b][a, b] = [a_i, b_i] \cap [a, b].

FIND-INTERSECTION\text{FIND-INTERSECTION} has a worst-case running time Θ(n)\Theta(n) since we evaluate the intersection from index 11 to A.lengthA.length of the array.

We can extend the QUICKSORT\text{QUICKSORT} to allow fuzzy sorting using FIND-INTERSECTION\text{FIND-INTERSECTION}.

First, partition the input array into "left", "middle", and "right" subarrays. The "middle" subarray elements overlap the interval [a,b][a, b] found by FIND-INTERSECTION\text{FIND-INTERSECTION}. As a result, they can appear in any order in the output.

We recursively call FUZZY-SORT\text{FUZZY-SORT} on the "left" and "right" subarrays to produce a fuzzy sorted array in-place. The following pseudocode implements these basic operations. One can run FUZZY-SORT(A,1,A.length)\text{FUZZY-SORT}(A, 1, A.length) to fuzzy-sort an array.

The first and last elements in a subarray are indexed by pp and rr, respectively. The index of the first and last intervals in the "middle" region are indexed by qq and tt, respectively.

FUZZY-SORT(A, p, r)
    if p < r
        (a, b) = FIND-INTERSECTION(A, p, r)
        t = PARTITION-RIGHT(A, a, p, r)
        q = PARTITION-LEFT(A, b, p, t)
        FUZZY-SORT(A, p, q - 1)
        FUZZY-SORT(A, t + 1, r)

We need to determine how to partition the input arrays into "left", "middle", and "right" subarrays in-place.

First, we PARTITION-RIGHT\text{PARTITION-RIGHT} the entire array from pp to rr using a pivot value equal to the left endpoint aa found by FIND-INTERSECTION\text{FIND-INTERSECTION}, such that aiaa_i \le a.

Then, we PARTITION-LEFT\text{PARTITION-LEFT} the subarray from pp to tt using a pivot value equal to the right endpoint bb found by FIND-INTERSECTION\text{FIND-INTERSECTION}, such that bi<bb_i < b.

PARTITION-RIGHT(A, a, p, r)
    i = p - 1
    for j = p to r - 1
        if A[j].a ≤ a
            i = i + 1
            exchange A[i] with A[j]
    exchange A[i + 1] with A[r]
    return i + 1
PARTITION-LEFT(A, b, p, t)
    i = p - 1
    for j = p to t - 1
        if A[j].b < b
            i = i + 1
            exchange A[i] with A[j]
    exchange A[i + 1] with A[t]
    return i + 1

The FUZZY-SORT\text{FUZZY-SORT} is similar to the randomized quicksort presented in the textbook. In fact, PARTITION-RIGHT\text{PARTITION-RIGHT} and PARTITION-LEFT\text{PARTITION-LEFT} are nearly identical to the PARTITION\text{PARTITION} procedure on page 171. The primary difference is the value of the pivot used to sort the intervals.

b. We expect FUZZY-SORT\text{FUZZY-SORT} to have a worst-case running time Θ(nlgn)\Theta(n\lg n) for a set of input intervals which do not overlap each other. First, notice that lines 2 through 3 of FIND-INTERSECTION\text{FIND-INTERSECTION} select a random interval as the initial pivot interval. Recall that if the intervals are disjoint, then [a,b][a, b] will simply be this initial interval.

Since for this example there are no overlaps, the "middle" region created by lines 4 and 5 of FUZZY-SORT\text{FUZZY-SORT} will only contain the initially-selected interval. In general, line 3 is Θ(n)\Theta(n). Fortunately, since the pivot interval [a,b][a, b] is randomly-selected, the expected sizes of the "left" and "right" subarrays are both n2\left\lfloor \frac{n}{2} \right\rfloor. In conclusion, the reccurrence of the running time is

T(n)2T(n/2)+Θ(n)=Θ(nlgn). \begin{aligned} T(n) & \le 2T(n / 2) + \Theta(n) \\ & = \Theta(n\lg n). \end{aligned}

The FIND-INTERSECTION\text{FIND-INTERSECTION} will always return a non-empty region of overlap [a,b][a, b] containing xx if the intervals all overlap at xx. For this situation, every interval will be within the "middle" region since the "left" and "right" subarrays will be empty, lines 6 and 7 of FUZZY-SORT\text{FUZZY-SORT} are Θ(1)\Theta(1). As a result, there is no recursion and the running time of FUZZY-SORT\text{FUZZY-SORT} is determined by the Θ(n)\Theta(n) running time required to find the region of overlap. Therfore, if the input intervals all overlap at a point, then the expected worst-case running time is Θ(n)\Theta(n).