27-2 Saving temporary space in matrix multiplication

The P-MATRIX-MULTIPLY-RECURSIVE\text{P-MATRIX-MULTIPLY-RECURSIVE} procedure has the disadvantage that it must allocate a temporary matrix TT of size n×nn \times n, which can adversely affect the constants hidden by the Θ\Theta-notation. The P-MATRIX-MULTIPLY-RECURSIVE\text{P-MATRIX-MULTIPLY-RECURSIVE} procedure does have high parallelism, however. For example, ignoring the constants in the Θ\Theta-notation, the parallelism for multiplying 1000×10001000 \times 1000 matrices comes to approximately 10003/102=1071000^3 / 10^2 = 10^7, since lg100010\lg 1000 \approx 10. Most parallel computers have far fewer than 10 million processors.

a. Describe a recursive multithreaded algorithm that eliminates the need for the temporary matrix TT at the cost of increasing the span to Θ(n)\Theta(n). (Hint:\textit{Hint:} Compute C=C+ABC = C + AB following the general strategy of P-MATRIX-MULTIPLY-RECURSIVE\text{P-MATRIX-MULTIPLY-RECURSIVE}, but initialize CC in parallel and insert a sync in a judiciously chosen location.)

b. Give and solve recurrences for the work and span of your implementation.

c. Analyze the parallelism of your implementation. Ignoring the constants in the Θ\Theta-notation, estimate the parallelism on 1000×10001000 \times 1000 matrices. Compare with the parallelism of P-MATRIX-MULTIPLY-RECURSIVE\text{P-MATRIX-MULTIPLY-RECURSIVE}.

(Removed)