23-2 Minimum spanning tree in sparse graphs
For a very sparse connected graph $G = (V, E)$, we can further improve upon the $O(E + V\lg V)$ running time of Prim's algorithm with Fibonacci heaps by preprocessing $G$ to decrease the number of vertices before running Prim's algorithm. In particular, we choose, for each vertex $u$, the minimum-weight edge $(u, v)$ incident on $u$, and we put $(u, v)$ into the minimum spanning tree under construction. We then contract all chosen edges (see Section B.4). Rather than contracting these edges one at a time, we first identify sets of vertices that are united into the same new vertex. Then we create the graph that would have resulted from contracting these edges one at a time, but we do so by "renaming" edges according to the sets into which their endpoints were placed. Several edges from the original graph may be renamed the same as each other. In such a case, only one edge results, and its weight is the minimum of the weights of the corresponding original edges.
Initially, we set the minimum spanning tree $T$ being constructed to be empty, and for each edge $(u, v) \in E$, we initialize the attributes $(u, v).orig = (u, v)$ and $(u, v).c = w(u, v)$. We use the $orig$ attribute to reference the edge from the initial graph that is associated with an edge in the contracted graph. The $c$ attribute holds the weight of an edge, and as edges are contracted, we update it according to the above scheme for choosing edge weights. The procedure $\text{MST-REDUCE}$ takes inputs $G$ and $T$, and it returns a contracted graph $G'$ with updated attributes $orig'$ and $c'$. The procedure also accumulates edges of $G$ into the minimum spanning tree $T$.
cpp MST-REDUCE(G, T) for each v ∈ G.V v.mark = false MAKE-SET(v) for each u ∈ G.V if u.mark == false choose v ∈ G.Adj[u] such that (u, v).c is minimized UNION(u, v) T = T ∪ {(u, v).orig} u.mark = v.mark = true G'.V = {FIND-SET(v): v ∈ G.V} G'.E = Ø for each (x, y) ∈ G.E u = FIND-SET(x) v = FIND-SET(y) if (u, v) ∉ G'.E G'.E = G'.E ∪ {(u, v)} (u, v).orig' = (x, y).orig (u, v).c' = (x, y).c else if (x, y).c < (u, v).c' (u, v).orig' = (x, y).orig (u, v).c' = (x, y).c construct adjacency lists G'.Adj for G' return G' and T
a. Let $T$ be the set of edges returned by $\text{MST-REDUCE}$, and let $A$ be the minimum spanning tree of the graph $G'$ formed by the call $\text{MST-PRIM}(G', c', r)$, where $c'$ is the weight attribute on the edges of $G'.E$ and $r$ is any vertex in $G'.V$. Prove that $T \cup \{(x,y).orig': (x, y) \in A\}$ is a minimum spanning tree of $G$.
b. Argue that $|G'.V| \le |V| / 2$.
c. Show how to implement $\text{MST-REDUCE}$ so that it runs in $O(E)$ time. ($\textit{Hint:}$ Use simple data structures.)
d. Suppose that we run $k$ phases of $\text{MST-REDUCE}$, using the output $G'$ produced by one phase as the input $G$ to the next phase and accumulating edges in $T$. Argue that the overall running time of the $k$ phases is $O(kE)$.
e. Suppose that after running $k$ phases of $\text{MST-REDUCE}$, as in part (d), we run Prim's algorithm by calling $\text{MST-PRIM}(G', c', r)$, where $G'$, with weight attribute $c'$, is returned by the last phase and $r$ is any vertex in $G'.V$. Show how to pick $k$ so that the overall running time is $O(E\lg\lg V)$. Argue that your choice of $k$ minimizes the overall asymptotic running time.
f. For what values of $|E|$ (in terms of $|V|$) does Prim's algorithm with preprocessing asymptotically beat Prim's algorithm without preprocessing?
a. We'll show that the edges added at each step are safe. Consider an unmarked vertex $u$. Set $S = \{u\}$ and let $A$ be the set of edges in the tree so far. Then the cut respects $A$, and the next edge we add is a light edge, so it is safe for $A$. Thus, every edge in $T$ before we run Prim's algorithm is safe for $T$. Any edge that Prim's would normally add at this point would have to connect two of the trees already created, and it would be chosen as minimal. Moreover, we choose exactly one between any two trees. Thus, the fact that we only have the smallest edges available to us is not a problem. The resulting tree must be minimal.
b. We argue by induction on the number of vertices in $G$. We'll assume that $|V| > 1$, since otherwise $\text{MST-REDUCE}$ will encounter an error on line 6 because there is no way to choose $v$. Let $|V| = 2$. Since $G$ is connected, there must be an edge between $u$ and $v$, and it is trivially of minimum weight. They are joined, and $|G'.V| = 1 = |V| / 2$.
Suppose the claim holds for $|V| = n$. Let $G$ be a connected graph on $n + 1$ vertices. Then $G'.V \le n / 2$ prior to the final vertex $v$ being examined in the for-loop of line 4. If $v$ is marked then we're done, and if $v$ isn't marked then we'll connect it to some other vertex, which must be marked since $v$ is the last to be processed.
Either way, $v$ can't contribute an additional vertex to $G'.V$. so
$$|G'.V| \le n / 2 \le (n + 1) / 2.$$
c. Rather than using the disjoint set structures of chapter 21, we can simply use an array to keep track of which component a vertex is in. Let $A$ be an array of length $|V|$ such that $A[u] = v$ if $v = \text{FIND-SET}(u)$. Then $\text{FIND-SET}(u)$ can now be replaced with $A[u]$ and $\text{UNION}(u, v)$ can be replaced by $A[v] = A[u]$. Since these operations run in constant time, the runtime is $O(E)$.
d. The number of edges in the output is monotonically decreasing, so each call is $O(E)$. Thus, $k$ calls take $O(kE)$ time.
e. The runtime of Prim's algorithm is $O(E + V\lg V)$. Each time we run $\text{MST-REDUCE}$, we cut the number of vertices at least in half. Thus, after $k$ calls, the number of vertices is at most $|V| / 2^k$. We need to minimize
$$E + V / 2^k\lg(V / 2^k) + kE = E + \frac{V\lg V}{2^k} - \frac{Vk}{2^k} + kE$$
with respect to $k$. If we choose $k = \lg\lg V$ then we achieve the overall running time of $O(E\lg\lg V)$ as desired.
To see that this value of $k$ minimizes, note that the $\frac{Vk}{2^k}$ term is always less than the $kE$ term since $E \ge V$. As $k$ decreases, the contribution of $kE$ decreases, and the contribution of $\frac{V\lg V}{2^k}$ increases. Thus, we need to find the value of $k$ which makes them approximately equal in the worst case, when $E = V$. To do this, we set $\frac{\lg V}{2^k} = k$. Solving this exactly would involve the Lambert W function, but the nicest elementary function which gets close is $k = \lg\lg V$.
f. We simply set up the inequality
$$E\lg\lg V < E + V\lg V$$
to find that we need
$$E < \frac{V\lg V}{\lg\lg V-1} = O(\frac{V\lg V}{\lg\lg V}).$$