27-1 Implementing parallel loops using nested parallelism
Consider the following multithreaded algorithm for performing pairwise addition on -element arrays and , storing the sums in :
cpp SUM-ARRAYS(A, B, C) parallel for i = 1 to A.length C[i] = A[i] + B[i]
a. Rewrite the parallel loop in using nested parallelism (spawn and sync) in the manner of . Analyze the parallelism of your implementation.
Consider the following alternative implementation of the parallel loop, which contains a value to be specified:
cpp SUM-ARRAYS'(A, B, C) n = A.length grain-size = ? // to be determined r = ceil(n / grain-size) for k = 0 to r - 1 spawn ADD-SUBARRAY(A, B, C, k * grain-size + 1, min((k + 1) * grain-size, n)) sync
cpp ADD-SUBARRAY(A, B, C, i, j) for k = i to j C[k] = A[k] + B[k]
b. Suppose that we set . What is the parallelism of this implementation?
c. Give a formula for the span of in terms of and . Derive the best value for grain-size to maximize parallelism.
a. See the algorithm . The parallelism is since it's work is and the span is .
b. If grainsize is , this means that each call of just sums a single pair of numbers. This means that since the for loop on line 4 will run times, both the span and work will be . So, the parallelism is just .
SUM-ARRAYS(A, B, C)
n = floor(A.length / 2)
if n == 0
C[1] = A[1] + B[1]
else
spawn SUM-ARRAYS(A[1..n], B[1..n], C[1..n])
SUM-ARRAYS(A[n + 1..A.length], B[n + 1..A..length], C[n + 1..A.length])
c. Let be the grainsize. The runtime of the function that spawns all the other functions is . The runtime of any particular spawned task is . So, we want to minimize
To do this we pull out our freshman calculus hat and take a derivative, we have
To solve this, we set . This minimizes the quantity and makes the span . Resulting in a parallelism of .