27-2 Saving temporary space in matrix multiplication

The $\text{P-MATRIX-MULTIPLY-RECURSIVE}$ procedure has the disadvantage that it must allocate a temporary matrix $T$ of size $n \times n$ , which can adversely affect the constants hidden by the $\Theta$ -notation. The $\text{P-MATRIX-MULTIPLY-RECURSIVE}$ procedure does have high parallelism, however. For example, ignoring the constants in the $\Theta$ -notation, the parallelism for multiplying $1000 \times 1000$ matrices comes to approximately $1000^3 / 10^2 = 10^7$ , since $\lg 1000 \approx 10$ . Most parallel computers have far fewer than 10 million processors.

a. Describe a recursive multithreaded algorithm that eliminates the need for the temporary matrix $T$ at the cost of increasing the span to $\Theta(n)$ . ( $\textit{Hint:}$ Compute $C = C + AB$ following the general strategy of $\text{P-MATRIX-MULTIPLY-RECURSIVE}$ , but initialize $C$ in parallel and insert a sync in a judiciously chosen location.)

b. Give and solve recurrences for the work and span of your implementation.

c. Analyze the parallelism of your implementation. Ignoring the constants in the $\Theta$ -notation, estimate the parallelism on $1000 \times 1000$ matrices. Compare with the parallelism of $\text{P-MATRIX-MULTIPLY-RECURSIVE}$ .

(Removed)