Basic Parallelization Facilities

COINS provides following facilities for parallelization.

Loop Analyzer

The loop analyzer investigates memory access areas for each loop to find the possibility of overlap of access areas between iterations of the loop by using the result of control flow and data flow analysis. The analyzer judges following loops as parallelizable.

The analysis is done from inner-most loop to outer-most loop. Subprograms and loops to be analyzed can be specified by pragma.

Loop Normalization

A loop is changed to a for-loop with loop index changing its value 0, 1, 2, ... in such form as

    for (i = 0; i < n; i = i + 1) { .... }

where, i is the loop index and n is the iteration count.

All induction variables (integer variable that increases/decreases by a fixed number by iteration) are detected and their values are computed from the loop index. Corresponding to the change of induction variable representation, array element expressions are changed accordingly.

OpenMP program generation

For normalized do-all loops, OpenMP directives are attached so that the loop can be executed in parallel by using an OpenMP compiler. For loops that failed to be parallelized, the reason of failure will be indicated by comment. The resultant program is generated as a C program with OpenMP directives and comments by using HIR-to-C translator of COINS.

Machine code generation for parallel execution

If machine code generation for parallel execution is specified by compile command, then each of do-all loops selected by pragma is transformed to a subprogram to be executed as a slave thread and the original program is cahnged to be executed as master thread without using OpenMP. The program can be executed in parallel by linking with run-time routine that controls parallel execution.

Note

For more detail, see Parallelization for HIR.