6. Parallelization for HIR
Contents
6.1. Loop Parallelizer
"coins.lparallel" package analyzes the program for parallelizable loops and outputs the C
code with applicable OpenMP pragma directives. It does not do the
parallelization itself, but the driver can call an external OpenMP compiler.
6.1.1. Usage
Use the driver coins.lparallel.LoopPara instead of coins.driver.Driver.
To output C file with OpenMP directives, type
java coins.lparallel.LoopPara -coins:hir2c foo.c
where the C file named foo-loop.c will be output.
To output executable (requires an OpenMP compiler named 'omcc'), type
java coins.lparallel.LoopPara foo.c
This is just the above C output passed to an OpenMP compiler.
(There may be other configuration options, such as environment variables,
needed to execute the resultant eode in parallel: see your OpenMP compiler
manual.)
The driver LoopPara also supports HIR optimization options supported by the
driver coins.aflow.FlowOpt. For example,
java coins.lparallel.LoopPara -coins:hirOpt=cpf,hir2c foo.c
This may enable some code that otherwize is not parallelizable to be
parallelized.
6.2. Coarse Grain Parallelizing Module
6.2.1. OVERVIEW
6.2.1.1. Design Concept
The coarse-grain parallelizing module is constructed for realizing a
coarse-grain parallelizing compiler named CoCo in java. The CoCo is the
research product and it is still at the infant stage as a parallelizing
compiler. Therefore it contains many constraints for practical usage as
mentioned later. We have found a lot of important issues which should be
solved as practical coarse-grain parallelizing compilers by implementing
the CoCo as an automatic parallelizing compiler. The coarse-grain
parallelizing module is a part of the COINS infrastructure, and then the
module components are available as a set of parts for coarse-grain
parallelization.
The CoCo analyzes an input C program and transforms it into a macro
(coarse-grain) task graph with data/control flow dependence. Then, the
CoCo parallelizes the macro tasks by using OpenMP directives for SMP
machines. This analysis and transformation are carried out on Coins HIR
(High-level Intermediate Representation). The CoCo generates a parallel
program in HIR containing OpenMP directives as comments. The HIR program
is translated into a C program with OpenMP directives by the HIR-to-C
translator. Finally, it is compiled by the Omni-OpenMP compiler and then
executed in parallel on a SMP machine.
Macro tasks correspond to basic blocks, loops and/or subroutines. After an
input C program is divided into macro tasks, an execution starting
condition of each macro task is analyzed. The execution starting condition
represents whether the macro task can be executed or not at a certain time.
A runtime macro task scheduler evaluates an execution starting condition of
each macro task at execution time, and dynamically assigns executable one
to a light load processor of a SMP machine.
The coarse-grain parallelizing module is a tool set, which consists of the
following functions:
- Divides an input C program into macro tasks based on basic blocks,
- Analyses an execution starting condition of each macro task,
- Embeds OpenMP directives for parallel execution at a macro task
level,
- Schedules dynamically each macro task to a processor of a SMP
machine.
6.2.1.2. Data Structures
The coarse-grain parallelizing module utilizes a macro flow graph model.
Nodes of the graph correspond macro tasks. As for edges between nodes,
there are two types of edges representing control flow and data flow
dependences.
An execution starting condition is represented in a boolean expression.
The operators consist of 'logical AND' and/or 'logical OR'. The operand
conditions are as follows:
- If macro tasks with data dependence have been executed or decided
not to be executed,
- If control flow dependence to a macro task has been decided.
6.2.1.3. Scheduler
The runtime macro task scheduler is independently attached to the main part
of an output program. If an input program is named 'xxx.c', the scheduler
written in C language is located at the file named 'xxx-sch.c' at the same
directory.
6.2.2. CONSTRAINTS OF CURRENT IMPLEMENTATION
The current version of the coarse-grain parallelizing module, CoCo, has the
following constraints:
- The coarse-grain parallelizing module parallelizes only a main
function. When an input program has several functions, the module
ignores the other functions.
- To execute a program in parallel efficiently, the module should
adjust grain granularity of tasks such as 'loop unrolling'. Up to
now, the module does not do that.
- A loop in a program is translated into a single macro task. The
module recognizes only reserved words in HIR such as 'while' and/or
'for' as loops. Other types of loop are not translated into macro
tasks.
- The module finds out an exit macro task only if the task has no
successors or includes return statements. Other macro tasks which
include 'exit()' functions, for example, are not recognized as exit
ones.
- When there are some macro tasks which have no dependence with each
other, the execution order of macro tasks may be different from the
order of sequential execution.
6.2.3. HOW TO USE
The CoCo inserts OpenMP directives into a HIR program as comments for
coarse-grain parallelizing. The CoCo utilizes 'hir2c' module, translator
from HIR to a C program, since the back end of Coins does not support the
OpenMP directives yet. After a coarse-grain parallel C program is
generated by hir2c, you must compile it by an OpenMP compiler in order to
execute in parallel on a SMP machine.
To obtain a coarse-grain parallel program, you should operate as follows:
- Compile 'xxx.c' by Coins C compiler specifying the option
'-coins: mdf, hir2c=opt'.
> java -classpath ./classes Driver -coins:mdf, hir2c=opt xxx.c
- Compile the program with a runtime scheduler by Omni-OpenMP 'omcc'.
> omcc xxx-hir-opt.c xxx-sch.c
6.2.4. OPTIONS
There are several compile time options for the coarse-grain parallelizing
module. For other options of the Coins Compiler Driver, see 2. How to use the Compiler Driver
or
3. How to use C Compiler.
- -coins:trace=MDF.xxxx
-
To output trace information of this module for debugging, and specify the
trace level as follows:
2000 : Output general debug information of the module.
- -coins:hir2c=opt
-
Translate HIR into a C program after optimizing.
- -coins:stopafterhir2c
-
Quit compilation of each compile unit just after generating a C program by
'hir2c'.
- -coins:mdf
-
Use this module.