27-4 Multithreading reductions and prefix computations
A -reduction of an array , where is an associative operator, is the value
The following procedure computes the -reduction of a subarray serially.
cpp REDUCE(x, i, j) y = x[i] for k = i + 1 to j y = y โ x[k] return y
a. Use nested parallelism to implement a multithreaded algorithm , which performs the same function with work and span. Analyze your algorithm.
A related problem is that of computing a -prefix computation, sometimes called a -scan, on an array , where is once again an associative operator. The -scan produces the array given by
that is, all prefixes of the array "summed" using operator. The following serial procedure performs a -prefix computation:
cpp SCAN(x) n = x.length let y[1..n] be a new array y[1] = x[1] for i = 2 to n y[i] = y[i - 1] โ x[i] return y
Unfortunately, multithreading is not straightforward. For example, changing the for loop to a parallel for loop would create races, since each iteration of the loop body depends on the previous iteration. The following procedure performs the -prefix computation in parallel, albeit inefficiently.
cpp P-SCAN-1(x) n = x.length let y[1..n] be a new array P-SCAN-1-AUX(x, y, 1, n) return y
cpp P-SCAN-1-AUX(x, y, i, j) parallel for l = i to j y[l] = P-REDUCE(x, 1, l)
b. Analyze the work, span, and parallelism of .
By using nested parallelism, we can obtain a more efficient -prefix computation:
cpp P-SCAN-2(x) n = x.length let y[1..n] be a new array P-SCAN-2-AUX(x, y, 1, n) return y
cpp P-SCAN-2-AUX(x, y, i, j) if i == j y[i] = x[i] else k = floor((i + j) / 2) spawn P-SCAN-2-AUX(x, y, i, k) P-SCAN-2-AUX(x, y, k + 1, j) sync parallel for l = k + 1 to j y[l] = y[k] โ y[l]
c. Argue that is correct, and analyze its work, span, and parallelism.
We can improve on both and by performing the -prefix computation in two distinct passes over the data. On the first pass, we gather the terms for various contiguous subarrays of into a temporary array , and on the second pass we use the terms in to compute the final result . The following pseudocode implements this strategy, but certain expressions have been omitted:
cpp P-SCAN-3(x) n = x.length let y[1..n] and t[1..n] be new arrays y[1] = x[1] if n > 1 P-SCAN-UP(x, t, 2, n) P-SCAN-DOWN(x[1], x, t, y, 2, n) return y
cpp P-SCAN-UP(x, t, i, j) if i == j return x[i] else k = floor((i + j) / 2) t[k] = spawn P-SCAN-UP(x, t, i, k) right = P-SCAN-UP(x, t, k + 1, j) sync return ____ // fill in the blank
cpp P-SCAN-DOWN(v, x, t, y, i, j) if i == j y[i] = v โ x[i] else k = floor((i + j) / 2) spawn P-SCAN-DOWN(____, x, t, y, i, k) // fill in the blank P-SCAN-DOWN(____, x, t, y, k + 1, j) // fill in the blank sync
d. Fill in the three missing expressions in line 8 of and lines 5 and 6 of . Argue that with expressions you supplied, is correct. ( Prove that the value passed to satisfies .)
e. Analyze the work, span, and parallelism of .
(Removed)