7-6 Fuzzy sorting of intervals
Consider the problem in which we do not know the numbers exactly. Instead, for each number, we know an interval on the real line to which it belongs. That is, we are given closed intervals of the form , where . We wish to fuzzy-sort these intervals, i.e., to produce a permutation of the intervals such that for , there exists satisfying .
a. Design a randomized algorithm for fuzzy-sorting intervals. Your algorithm should have the general structure of an algorithm that quicksorts the left endpoints (the values), but it should take advantage of overlapping intervals to improve the running time. (As the intervals overlap more and more, the problem of fuzzy-sorting the intervals becoes progressively easier. Your algorithm should take advantage of such overlapping, to the extend that it exists.)
b. Argue that your algorithm runs in expected time in general, but runs in expected time when all of the intervals overlap (i.e., when there exists a value such that for all ). Your algorithm should not be checking for this case explicitly; rather, its performance should naturally improve as the amount of overlap increases.
a. With randomly selected left endpoint for the pivot, we could trivially perform fuzzy sorting by quicksorting the left endpoints, 's. This would achieve the worst-case expected running time of . We definitely can do better by exploit the characteristic that we don't have to sort overlapping intervals. That is, for two overlapping intervals, and . In such situations, we can always choose (within the intersection of these intervals) such that or .
Since overlapping intervals do not require sorting, we can improve the expected running time by modifying quicksort to identify overlaps:
FIND-INTERSECTION(A, p, r)
rand = RANDOM(p, r)
exchange A[rand] with A[r]
a = A[r].a
b = A[r].b
for i = p to r - 1
if A[i].a ≤ b and A[i].b ≥ a
if A[i].a > a
a = A[i].a
if A[i].b < b
b = A[i].b
return (a, b)
On lines 2 through 3 of , we select a random pivot interval as the initial region of overlap . There are two situations:
- If the intervals are all disjoint, then the estimated region of overlap will be this randomly-selected interval;
- otherwise, on lines 6 through 11, we loop through all intervals in arrays (except the endpoint which is the initial pivot interval). At each iteration, we determine if the current interval overlaps the current estimated region of overlap. If it does, we update the estimated region of overlap as .
has a worst-case running time since we evaluate the intersection from index to of the array.
We can extend the to allow fuzzy sorting using .
First, partition the input array into "left", "middle", and "right" subarrays. The "middle" subarray elements overlap the interval found by . As a result, they can appear in any order in the output.
We recursively call on the "left" and "right" subarrays to produce a fuzzy sorted array in-place. The following pseudocode implements these basic operations. One can run to fuzzy-sort an array.
The first and last elements in a subarray are indexed by and , respectively. The index of the first and last intervals in the "middle" region are indexed by and , respectively.
FUZZY-SORT(A, p, r)
if p < r
(a, b) = FIND-INTERSECTION(A, p, r)
t = PARTITION-RIGHT(A, a, p, r)
q = PARTITION-LEFT(A, b, p, t)
FUZZY-SORT(A, p, q - 1)
FUZZY-SORT(A, t + 1, r)
We need to determine how to partition the input arrays into "left", "middle", and "right" subarrays in-place.
First, we the entire array from to using a pivot value equal to the left endpoint found by , such that .
Then, we the subarray from to using a pivot value equal to the right endpoint found by , such that .
PARTITION-RIGHT(A, a, p, r)
i = p - 1
for j = p to r - 1
if A[j].a ≤ a
i = i + 1
exchange A[i] with A[j]
exchange A[i + 1] with A[r]
return i + 1
PARTITION-LEFT(A, b, p, t)
i = p - 1
for j = p to t - 1
if A[j].b < b
i = i + 1
exchange A[i] with A[j]
exchange A[i + 1] with A[t]
return i + 1
The is similar to the randomized quicksort presented in the textbook. In fact, and are nearly identical to the procedure on page 171. The primary difference is the value of the pivot used to sort the intervals.
b. We expect to have a worst-case running time for a set of input intervals which do not overlap each other. First, notice that lines 2 through 3 of select a random interval as the initial pivot interval. Recall that if the intervals are disjoint, then will simply be this initial interval.
Since for this example there are no overlaps, the "middle" region created by lines 4 and 5 of will only contain the initially-selected interval. In general, line 3 is . Fortunately, since the pivot interval is randomly-selected, the expected sizes of the "left" and "right" subarrays are both . In conclusion, the reccurrence of the running time is
The will always return a non-empty region of overlap containing if the intervals all overlap at . For this situation, every interval will be within the "middle" region since the "left" and "right" subarrays will be empty, lines 6 and 7 of are . As a result, there is no recursion and the running time of is determined by the running time required to find the region of overlap. Therfore, if the input intervals all overlap at a point, then the expected worst-case running time is .