6. Parallelization for HIR

Contents

6.1. Loop Parallelizer

"coins.lparallel" package analyzes the program for parallelizable loops and outputs the C code with applicable OpenMP pragma directives. It does not do the parallelization itself, but the driver can call an external OpenMP compiler.

6.1.1. Usage

Use the driver coins.lparallel.LoopPara instead of coins.driver.Driver.

To output C file with OpenMP directives, type

    java coins.lparallel.LoopPara -coins:hir2c foo.c
where the C file named foo-loop.c will be output.

To output executable (requires an OpenMP compiler named 'omcc'), type

    java coins.lparallel.LoopPara foo.c
This is just the above C output passed to an OpenMP compiler. (There may be other configuration options, such as environment variables, needed to execute the resultant eode in parallel: see your OpenMP compiler manual.)

The driver LoopPara also supports HIR optimization options supported by the driver coins.aflow.FlowOpt. For example,

  java coins.lparallel.LoopPara -coins:hirOpt=cpf,hir2c foo.c
This may enable some code that otherwize is not parallelizable to be parallelized.

6.2. Coarse Grain Parallelizing Module

6.2.1. OVERVIEW

6.2.1.1. Design Concept
The coarse-grain parallelizing module is constructed for realizing a coarse-grain parallelizing compiler named CoCo in java. The CoCo is the research product and it is still at the infant stage as a parallelizing compiler. Therefore it contains many constraints for practical usage as mentioned later. We have found a lot of important issues which should be solved as practical coarse-grain parallelizing compilers by implementing the CoCo as an automatic parallelizing compiler. The coarse-grain parallelizing module is a part of the COINS infrastructure, and then the module components are available as a set of parts for coarse-grain parallelization.

The CoCo analyzes an input C program and transforms it into a macro (coarse-grain) task graph with data/control flow dependence. Then, the CoCo parallelizes the macro tasks by using OpenMP directives for SMP machines. This analysis and transformation are carried out on Coins HIR (High-level Intermediate Representation). The CoCo generates a parallel program in HIR containing OpenMP directives as comments. The HIR program is translated into a C program with OpenMP directives by the HIR-to-C translator. Finally, it is compiled by the Omni-OpenMP compiler and then executed in parallel on a SMP machine.

Macro tasks correspond to basic blocks, loops and/or subroutines. After an input C program is divided into macro tasks, an execution starting condition of each macro task is analyzed. The execution starting condition represents whether the macro task can be executed or not at a certain time. A runtime macro task scheduler evaluates an execution starting condition of each macro task at execution time, and dynamically assigns executable one to a light load processor of a SMP machine.

The coarse-grain parallelizing module is a tool set, which consists of the following functions:

  1. Divides an input C program into macro tasks based on basic blocks,
  2. Analyses an execution starting condition of each macro task,
  3. Embeds OpenMP directives for parallel execution at a macro task level,
  4. Schedules dynamically each macro task to a processor of a SMP machine.
6.2.1.2. Data Structures
The coarse-grain parallelizing module utilizes a macro flow graph model. Nodes of the graph correspond macro tasks. As for edges between nodes, there are two types of edges representing control flow and data flow dependences.

An execution starting condition is represented in a boolean expression. The operators consist of 'logical AND' and/or 'logical OR'. The operand conditions are as follows:

  1. If macro tasks with data dependence have been executed or decided not to be executed,
  2. If control flow dependence to a macro task has been decided.
6.2.1.3. Scheduler
The runtime macro task scheduler is independently attached to the main part of an output program. If an input program is named 'xxx.c', the scheduler written in C language is located at the file named 'xxx-sch.c' at the same directory.

6.2.2. CONSTRAINTS OF CURRENT IMPLEMENTATION

The current version of the coarse-grain parallelizing module, CoCo, has the following constraints:
  1. The coarse-grain parallelizing module parallelizes only a main function. When an input program has several functions, the module ignores the other functions.
  2. To execute a program in parallel efficiently, the module should adjust grain granularity of tasks such as 'loop unrolling'. Up to now, the module does not do that.
  3. A loop in a program is translated into a single macro task. The module recognizes only reserved words in HIR such as 'while' and/or 'for' as loops. Other types of loop are not translated into macro tasks.
  4. The module finds out an exit macro task only if the task has no successors or includes return statements. Other macro tasks which include 'exit()' functions, for example, are not recognized as exit ones.
  5. When there are some macro tasks which have no dependence with each other, the execution order of macro tasks may be different from the order of sequential execution.

6.2.3. HOW TO USE

The CoCo inserts OpenMP directives into a HIR program as comments for coarse-grain parallelizing. The CoCo utilizes 'hir2c' module, translator from HIR to a C program, since the back end of Coins does not support the OpenMP directives yet. After a coarse-grain parallel C program is generated by hir2c, you must compile it by an OpenMP compiler in order to execute in parallel on a SMP machine.

To obtain a coarse-grain parallel program, you should operate as follows:

  1. Compile 'xxx.c' by Coins C compiler specifying the option '-coins: mdf, hir2c=opt'.
  2. > java -classpath ./classes Driver -coins:mdf, hir2c=opt xxx.c
    
  3. Compile the program with a runtime scheduler by Omni-OpenMP 'omcc'.
  4.  > omcc xxx-hir-opt.c xxx-sch.c
    

6.2.4. OPTIONS

There are several compile time options for the coarse-grain parallelizing module. For other options of the Coins Compiler Driver, see 2. How to use the Compiler Driver or 3. How to use C Compiler.
-coins:trace=MDF.xxxx
To output trace information of this module for debugging, and specify the trace level as follows:
        2000 :  Output general debug information of the module.
-coins:hir2c=opt
Translate HIR into a C program after optimizing.
-coins:stopafterhir2c
Quit compilation of each compile unit just after generating a C program by 'hir2c'.
-coins:mdf
Use this module.