27-2 Saving temporary space in matrix multiplication
The procedure has the disadvantage that it must allocate a temporary matrix of size , which can adversely affect the constants hidden by the -notation. The procedure does have high parallelism, however. For example, ignoring the constants in the -notation, the parallelism for multiplying matrices comes to approximately , since . Most parallel computers have far fewer than 10 million processors.
a. Describe a recursive multithreaded algorithm that eliminates the need for the temporary matrix at the cost of increasing the span to . ( Compute following the general strategy of , but initialize in parallel and insert a sync in a judiciously chosen location.)
b. Give and solve recurrences for the work and span of your implementation.
c. Analyze the parallelism of your implementation. Ignoring the constants in the -notation, estimate the parallelism on matrices. Compare with the parallelism of .
(Removed)