21-1 Off-line minimum

The off-line minimum problem asks us to maintain a dynamic set $T$ of elements from the domain $\{1, 2, \ldots, n\}$ under the operations $\text{INSERT}$ and $\text{EXTRACT-MIN}$. We are given a sequence $S$ of $n$ $\text{INSERT}$ and $m$ $\text{EXTRACT-MIN}$ calls, where each key in $\{1, 2, \ldots, n\}$ is inserted exactly once. We wish to determine which key is returned by each $\text{EXTRACT-MIN}$ call. Specifically, we wish to fill in an array $extracted[1..m]$, where for $i = 1, 2, \ldots, m$, $extracted[i]$ is the key returned by the $i$th $\text{EXTRACT-MIN}$ call. The problem is "off-line" in the sense that we are allowed to process the entire sequence $S$ before determining any of the returned keys.

a. In the following instance of the off-line minimum problem, each operation $\text{INSERT}(i)$ is represented by the value of $i$ and each $\text{EXTRACT-MIN}$ is represented by the letter $\text E$:

$$4, 8, \text E, 3, \text E, 9, 2, 6, \text E, \text E, \text E, 1, 7, \text E, 5.$$

Fill in the correct values in the extracted array.

To develop an algorithm for this problem, we break the sequence $S$ into homogeneous subsequences. That is, we represent $S$ by

$$\text I_1, \text E, \text I_2, \text E, \text I_3, \ldots, \text I_m,\text E, \text I_{m + 1},$$

where each $\text E$ represents a single $\text{EXTRACT-MIN}$ call and each $\text{I}_j$ represents a (possibly empty) sequence of $\text{INSERT}$ calls. For each subsequence $\text{I}_j$ , we initially place the keys inserted by these operations into a set $K_j$, which is empty if $\text{I}_j$ is empty. We then do the following:

cpp OFF-LINE-MINIMUM(m, n) for i = 1 to n determine j such that i ∈ K[j] if j != m + 1 extracted[j] = i let l be the smallest value greater than j for which set K[l] exists K[l] = K[j] ∪ K[l], destroying K[j] return extracted

b. Argue that the array extracted returned by $\text{OFF-LINE-MINIMUM}$ is correct.

c. Describe how to implement $\text{OFF-LINE-MINIMUM}$ efficiently with a disjoint-set data structure. Give a tight bound on the worst-case running time of your implementation.

a.

$$ \begin{array}{|c|c|} \hline index & value \\ \hline 1 & 4 \\ 2 & 3 \\ 3 & 2 \\ 4 & 6 \\ 5 & 8 \\ 6 & 1 \\ \hline \end{array} $$

b. As we run the for loop, we are picking off the smallest of the possible elements to be removed, knowing for sure that it will be removed by the next unused $\text{EXTRACT-MIN}$ operation. Then, since that $\text{EXTRACT-MIN}$ operation is used up, we can pretend that it no longer exists, and combine the set of things that were inserted by that segment with those inserted by the next, since we know that the $\text{EXTRACT-MIN}$ operation that had separated the two is now used up. Since we proceed to figure out what the various extract operations do one at a time, by the time we are done, we have figured them all out.

c. We let each of the sets be represented by a disjoint set structure. To union them (as on line 6) just call $\text{UNION}$. Checking that they exist is just a matter of keeping track of a linked list of which ones exist(needed for line 5), initially containing all of them, but then, when deleting the set on line 6, we delete it from the linked list that we were maintaining. The only other interaction with the sets that we have to worry about is on line 2, which just amounts to a call of $\text{FIND-SET}(j)$. Since line 2 takes amortized time $\alpha(n)$ and we call it exactly $n$ times, then, since the rest of the for loop only takes constant time, the total runtime is $O(n\alpha(n))$.