27-1 Implementing parallel loops using nested parallelism

Consider the following multithreaded algorithm for performing pairwise addition on nn-element arrays A[1..n]A[1..n] and B[1..n]B[1..n], storing the sums in C[1..n]C[1..n]:

cpp SUM-ARRAYS(A, B, C) parallel for i = 1 to A.length C[i] = A[i] + B[i]

a. Rewrite the parallel loop in SUM-ARRAYS\text{SUM-ARRAYS} using nested parallelism (spawn and sync) in the manner of MAT-VEC-MAIN-LOOP\text{MAT-VEC-MAIN-LOOP}. Analyze the parallelism of your implementation.

Consider the following alternative implementation of the parallel loop, which contains a value grain-sizegrain\text-size to be specified:

cpp SUM-ARRAYS'(A, B, C) n = A.length grain-size = ? // to be determined r = ceil(n / grain-size) for k = 0 to r - 1 spawn ADD-SUBARRAY(A, B, C, k * grain-size + 1, min((k + 1) * grain-size, n)) sync

cpp ADD-SUBARRAY(A, B, C, i, j) for k = i to j C[k] = A[k] + B[k]

b. Suppose that we set grain-size=1grain\text -size = 1. What is the parallelism of this implementation?

c. Give a formula for the span of SUM-ARRAYS\text{SUM-ARRAYS}' in terms of nn and grain-sizegrain\text-size. Derive the best value for grain-size to maximize parallelism.

a. See the algorithm SUM-ARRAYS(A,B,C)\text{SUM-ARRAYS}(A, B, C). The parallelism is O(n)O(n) since it's work is nlgnn\lg n and the span is lgn\lg n.

b. If grainsize is 11, this means that each call of ADD-SUBARRAY\text{ADD-SUBARRAY} just sums a single pair of numbers. This means that since the for loop on line 4 will run nn times, both the span and work will be O(n)O(n). So, the parallelism is just O(1)O(1).

SUM-ARRAYS(A, B, C)
    n = floor(A.length / 2)
    if n == 0
        C[1] = A[1] + B[1]
    else
        spawn SUM-ARRAYS(A[1..n], B[1..n], C[1..n])
        SUM-ARRAYS(A[n + 1..A.length], B[n + 1..A..length], C[n + 1..A.length])

c. Let gg be the grainsize. The runtime of the function that spawns all the other functions is ng\left\lceil \frac{n}{g} \right\rceil. The runtime of any particular spawned task is gg. So, we want to minimize

ng+g.\frac{n}{g} + g.

To do this we pull out our freshman calculus hat and take a derivative, we have

0=1ng2.0 = 1 − \frac{n}{g^2}.

To solve this, we set g=ng = \sqrt n. This minimizes the quantity and makes the span O(n/g+g)=O(n)O(n / g + g) = O(\sqrt n). Resulting in a parallelism of O(n)O(\sqrt n).