8. SSA Optimization for LIR

8.1. OVERVIEW

8.1.1. Introduction

The SSA module includes optimizers on the Static Single Assignment form (SSA form). SSA is a representation where index is attached to every variable so that every definition of each variable in a program becomes unique.
At a joining point of the control flow graph where two or more different definitions of a variable reach, a hypothetical function called a phi-function is inserted so that these multiple definitions are merged.
Data flow analysis and optimization for sequential execution can be compacted using SSA form. The SSA module is invoked on LIR.

8.1.2. Translation to SSA form

The translation pass of the SSA module translates normal LIR into LIR-level SSA form.
Translation into SSA form generally consists of two phases, insertion of phi-functions and the renaming of variables. Three types of SSA form are well known: minimal SSA, semi-pruned SSA and pruned SSA. The algorithms for translating into these forms are almost the same, only the live variable analysis being different. The SSA module can translate into these types of SSA form. See section 8.2.

8.1.3. Back translation from SSA form

The back translation pass of the SSA module translates LIR-level SSA form into normal LIR.
Generally, in the back translation from SSA form, the task of a phi-function is divided into the predecessor basic blocks. Therefore, in most cases, the translation inserts some copy statements for variables used in the phi-function into the predecessor blocks of the basic block where the function resides, and then deletes the phi-function. The SSA module uses the method proposed by Briggs and the three methods proposed by Sreedhar to back translate from SSA form. See section 8.2.

8.1.4. Optimization

Many optimizations based on SSA form are implemented, such as Common Subexpression Elimination, Constant Propagation, Copy Propagation, Dead Code Elimination and loop-related optimizations. The SSA module also includes tools for optimization, such as Split Critical Edge.

The "Demand-driven partial redundancy elimination based on global value numbering" is an efficient method of eliminating common sub-expressions and partially redundant expressions. It hoists loop-invariant expressions out of loop.

In coins-1.5 and later versions, advanced optimization named "Scalar replacement by demand-driven partial redundancy elimination based on global value numbering" ("Scalar replacement by demand-driven PDE" in brief) is available.
It replaces memory accesses to register accesses. If it can be applied within a heavy loop, execution speed will be increased remarkably.
Let us explain the method using an example. In the inner loop of
  for (i=0; i<M; i=i+1) {
   for (j=1; j<N; j=j+1) {
    a[j+1]=a[j-1]+b[i];
   }
   c[i]=a[N];
  }
execution will proceed as follos:
  a[2]=a[0]+b[i];
  a[3]=a[1]+b[i];
  a[4]=a[2]+b[i];
  a[5]=a[3]+b[i];
  a[6]=a[4]+b[i];
  ....
If registers r0, r1, r2, r3 are available, then the above inner loop can be executed as follows:
  r0=b[i];
  r1=a[0]+r0;
  a[2]=r1;
  r2=a[1]+r0;
  a[3]=r2;
  r3=r1+r0;
  a[4]=r3;
  r1=r2+r0;
  a[5]=r1;
  r2=r3+r0;
  a[6]=r2;
  ....
This transformation replaces memory accesses to elements of array a and array b to register accesses.
For such transformation, it is necessary to recognize value identity of varaibles over loop iterations. In the above example, the computation of a[j+1] uses the value of a[j-1] that is computed 2 iterations earlier.
The optimization "Scalar replacement by demand-driven partial redundancy elimination based on global value numbering" ("Scalar replacement by demand driven PDE" in brief) does such transformation to replace mamoey accesses to register accesses. If such transformation is applied to heavy loops, execution speed will be increased.

In order to increase the effectiveness of such transformation, an option "presrhir" for HIR transformation is provided. When the option
  hirOpt=presrhir
is specified, above loop will be expanded to such a loop as
  for(i=0; i<M;i=i+1) {
    j=0; v1=1; v3=2;
    for(; j<N-v1; j=j+v3) {
      a[j+1]=a[j-1]+b[i]
      a[j+2]=a[j]+b[i];
    }
    for(; j<N; j=j+1) {
      a[j+1]=a[j-1]+b[i];
    }
    c[i]=a[N];
  }
and then the scalar replacement by demand-driven PDE will be applied according to SSA options. An example of option specifications is
  hirOpt=presrhir,
  ssa-opt=prun/divex3/esplt/cstp/presr/cse/cstp/hli/osr/hli/cstp/eqp/dce/srd3

The SSA optimizer options added for this purpose are
  • divex3 :  Divide expression into Three-Address Code
       (for Scalar replacement by demand-driven PDE).
  • eqp :  Demand-driven partial redundancy elimination without copy propagation
       (for decreasing required number of registers).
  • presr :  Do scalar replacement by demand-driven PRE.

  • You can specify the optimizers, tools and their order of application using the SSA options. See section 8.2.

    8.1.5. Details

    For details of the SSA optimization module, see [reference 1].

    8.2. SSA OPTIONS

    There are several compile time options for the SSA pass. For any other options of the COINS Compiler Driver, see 2. How to use the Compiler Driver or 3. How to use C Compiler.

    -coins:lir-opt=...
    
    can also be used instead of
    -coins:ssa-opt=...
    
    (When ''lir-opt'' and ''ssa-opt'' are both written, only ''lir-opt'' is effective.)
    -coins:ssa-opt=<optimizations-in-SSA-form>/<optimizations-in-normal-form>/<optimizations-in-SSA-form>

    Use SSA pass. This is necessary for using the SSA module. There are several optimizations in this module. To invoke the optimization, you should specify the optimizers with this option. Specified optimizers are invoked from left to right.

    Both <optimizations-in-SSA-form> and <optimizations-in-normal-form> are optional with the separating "/".
    note: <optimizations-in-normal-form> is available after coins-1.4.4.2.

    <optimizations-in-SSA-form> is of the form "xxx/yyy/.../zzz". First, as `xxx' you MUST specify to which kind of SSA form you like to translate, such as minimal, semi-pruned or pruned. And then, as `yyy' you can specify the optimizers which the SSA module invokes. You can specify the same optimizer two times, three times, and so on. Only optimizations that you specify are performed in that order. Finally, as `zzz' you MUST specify how to back translate from SSA form.

    These options are defined as follows:

    note:

    1. The option lcm may take long compilation time.
    2. It is not allowed to specify multiple uses of cntbb.
    3. In the x86_64-mac architecture cntbb options are not allowed because of their use of calling fprintf function.
    4. Cntbb doesn't count the following LIR instructions: PHI instruction, PROLOGUE instruction, EPILOGUE instruction, and LIR instructions used for counting instructions.
    5. Results of "cntbb" option are written down on files <Filename>.<Functionname>.cnt in the working directory where <Filename> and <Functionname> are the file name and function name of the source program respectively. Cntbb uses a working directory which is indicated with the COINS option ``tmpdir''. If the option is not specified, ``/tmp'' is used as the working directory.
    6. The counting instructions option counts the number of executed LIR instructions with compilation twice and execution once. At the first compilation, the information which is referenced at the second compilation is written into the file file_list.dat in the working directory. So before the first compilation the file should be removed, if it exists. To remove the working file file_list.dat and results of "cntbb", coins.ssa.ProApp is available.
    7. To show the counted result of LIR instructions, use the following command:
          java -cp ./classes coins.ssa.ProApp -t xxxx
      
      where, ``-t'' option specifies xxx as the working directory. If ``-t'' option is not given, ``/tmp'' directory is used.
      To erase the file_list.dat in the working directory, use the following command:
          java -cp ./classes coins.ssa.ProApp -t xxxx -n
      
    8. The options stlin, inslin, rmlin, shlin are used to set line number to each LIR instructions. There is another option ``-coins:debuginfo'' that gives source line number to some of LIR instructions. There are many LIR instructions that are difficult to be made correspondence with source line number, and the source line information is lost during SSA optimization. The above options for LIR line numbering give a sequence number for each LIR instructions and this number is preserved in the process of SSA optimization. The source line number is a positive integer, however, the LIR line number is a negative integer. You may see the correspondence of the LIR line number and the source line number for some LIR instructions. As an example, give a sequence of ssa-options
          -coins:ssa-opt=stlin/shlin/prun/shlin/cse/shlin/srd3/shlin 
      
      then you may see the movement of instructions for some LIR instructions in the process of SSA optimization.
    9. See "Optimization in Static Single Assignment Form - External Specification" (in Japanese) for details.

    <optimizations-in-normal-form> is of the form "www/...". You can specify as `www' the following options.

    Example (<optimizations-in-SSA-form>) : If you specify the option

    -coins:ssa-opt=prun/cstp/cse/srd3
    
    the SSA module performs the following in that order:
    1. make pruned SSA form
    2. invoke constant folding and propagation with conditional branches
    3. invoke common subexpression elimination
    4. back translate using Sreedhar's Method III

    Example (<optimizations-in-normal-form>/<optimizations-in-SSA-form>)

      -coins:ssa-opt=divex2/esplt/pdeqp/prun/divex/cse/cstp/hli/osr/hli/cstp/cpyp/preqp/cstp/rpe/dce/srd3
    
    "divex2/esplt/pdeqp" is <optimizations-in-normal-form> and the rest is <optimizations-in-SSA-form>.

    Following is an example that specifies the options ddpde and glia.

      -coins:ssa-opt=divex2/esplt/pdeqp/ddpde/expde/prun/divex/cse/cstp/hli/osr/hli/cstp/cpyp/preqp/cstp/rpe/glia/dce/srd3
    
    -coins:ssa-no-change-loop
    Before the optimizations, the SSA module changes the structure of the loops as follows, by default. This is for making effective loop optimization.
    1. merge the several loops that have the same header block
    2. insert the preheader
    3. change the loop structure from `while' type to `do-while' type (precisely `if-do-while'). The `while' type is a loop such that the header and exit block of the loop are the same block.
    The above is performed by default. If you DO NOT want to do that, specify this option.
    -coins:ssa-no-copy-folding
    During the translation to the SSA form, the SSA module removes and propagates the copy-assign statements such as `x=y', by default.

    If you DO NOT want to do that, specify this option.

    -coins:ssa-no-redundant-phi-elimination
    The SSA module eliminates redundant phi-functions after the translation to the SSA form, by default. A phi-function is redundant if:
    1. the names of the target and the arguments of the phi-function are all the same as follows:
        x1=phi(x1,x1,x1)
    2. the names of the arguments of the phi-function are all the same as follows:
       x1=phi(y1,y1,y1)
    3. there are also other cases as follows:
      x1=phi(y1,y1,x1)   or   x1=phi(y1,y1,BOTTOM)
    In the first case, the SSA module just eliminates the phi-function. In the second case, the SSA module eliminates the phi-function and replaces uses of `x1' by `y1' in the statements which are evaluated after the phi-function.
    For further details, see [reference 1].

    If you DO NOT want to do that, specify this option.

    -coins:ssa-no-sreedhar-coalescing
    During the back translation from SSA form by Sreedhar's method, the SSA module coalesces copy-related variables in SSA form, by default. This coalescing is proposed by Sreedhar and is called the SSA-based coalescing. This coalescing module is usually unified with Sreedhar's algorithm for back translation from SSA form. But for researchers' convenience, the SSA module can avoid it.

    If you DO NOT want to do SSA based coalescing, specify this option.

    -coins:ssa-with-chaitin-coalescing
    Perform coalescing proposed by Chaitin after the back translation to normal form. This coalesces copy-related variables whose live ranges do not interfere each other. In general, after the back translation from SSA form, there may be some copy assignment statements in the program. Some copy assignment statements only change the names of variables, that is, they are useless. Coalescing these variables eliminates the useless copy assignment statements. This optimization is done in normal form LIR after the back translation from SSA form.

    If you WANT to do that, specify this option.
    (This coalescing can be specified after any back translation method. But it may have no effect after the back translation by Sreedhar's Method III since that method does not insert copy assignment statements which can be coalesced by Chaitin's coalescing.)

    -coins:ssa-no-memory-analysis
    When Common Subexpression Elimination and/or Global Value Numbering and Partial Redundancy Elimination with Efficient Question Propagation are specified, the SSA module makes a simple alias analysis of memory accesses, by treating the whole memory as a single entity. (cf. section 8 of [reference 1])

    If you DO NOT want to do that, specify this option.

    -coins:ssa-no-replace-by-exp
    Just before the back translation from SSA form, the SSA module finds the local variables, which are not "live out" from the current basic block and are used only once in the current basic block, and replaces the variables by the expressions which define the variables. (cf. section 5.4.6 "preprocessing for temporary variables" of [reference 1])

    If you DO NOT want to do that, specify this option.

    -coins:trace=SSA.xxxx
    To output the trace information of this SSA module for debugging, specify the trace level as follows:
            1     : Output only the message that the SSA module is invoked
            100   : Output the agenda of the SSA module
            1500  : Output two kinds of information:
                    (a) The inserted phi-functions when the SSA module translates 
                       normal LIR into SSA form.
                    (b) The inserted copy assign statements when the SSA module 
                       back translates SSA form into normal LIR.
            2000  : Output general debug information of all optimizers in the SSA 
                    module
            10000 : Output much information about Sreedhar's Method III
    
    The trace information includes the levels less than or equal to what you specified. If you specify
    trace=SSA.1500
    then the SSA module outputs information related to the level `1', `100' and `1500'.
    -coins:ssa-opt=.../dump/...
    For compiler developers, the SSA module provides the option `dump' for debugging. This option should be specified within `ssa-opt'. When this option is specified, the SSA module outputs the current LIR into the standard output. For example, if the option is specified as follows:
     -coins:ssa-opt=prun/dump/srd3/dump
    the SSA module outputs the LIR (1) after translation into the pruned SSA form, and (2) after back translation from SSA form.
    -coins:regalloc=oca
    The register allocation algorithm, ''Optimistic Register Coalescing'' can be selected by this option. When this option is not specified, the default register allocation algorithm ''Iterated Register Coalescing'' is used. This option is available after coins-1.4.4.4.

    References

    [1] "Optimization in SSA form - external specification (pdf)" which is available on the web page of COINS SSA
    [2] Munehiro Takimoto, Demand-driven Partial Dead Code Elimination, IPSJ Transactions on Programming, Vol.5,No.1, 2012, pp.1-8.