For loops can be analyzed to determine their data dependencies and apply transformations to assist in their parallelization. The commands support array indices that are polynomials with rational coefficients. Here is an example of how a nested for loop can be transformed so that the inner loop can be parallelized:
The following procedure has dependencies across both the and the loop indices and cannot be easily parallelized in this form:
The dependencies in this loop cross both the and dimension, as can be seen in the equation of its dependence cone:
| (1.1) |
Applying a transformation to the loop will change the relationship between the data dependencies:
| (1.2) |
This transformed loop no longer has a relationship between the dimensions its distance vectors:
| (1.3) |
The data dependencies in the loop body only depend on previously computed elements. The pattern for updating the array in the original and transformed loops are shown below. The inner loop of the transformed program corresponds to the anti-diagonal lines of the array updates. These updated operations can be performed simultaneously since they only refer to previously updated array entries (those entries that point toward a particular array entry).
|
|
Original loop's array updates
|
Transformed loop's array updates
|
Legend
Red dots: Data points in array
Blue arrows: Read/write dependency between array entries
Green dotted lines: Order in which elements of the array are updated
|
|
|
The set of array indices for all values of the loop's index variables can also be computed:
| (1.4) |
| (1.5) |