1. How to use COINS compiler infrastructure

Contents

1.1. Getting Overview of the Compiler Infrastructure

The COINS compiler infrastructure is composed of

At present, the compiler infrastructure contains two parsers, for C and Fortran77, and 6 code generators, for SPARC, x86, x86_64, ARM, MIPS, SH-4, PowerPC, Alpha, MicroBlaze, and Thumb. Thus, it has 2*10 = 20 compilers.

Compilation with debug-print (trace) option will show how the compiling process proceeds and how a source program is transformed into intermediate representations (HIR and LIR) and then into a target machine code. It is the shortest way of understanding how the compiler infrastructure works and understanding HIR, LIR, and symbol table that are used as interfaces between compiler components. The debug-print option will be a great help in debugging the compiler when addition/modification is attempted.

Please try to compile several short programs specifying debug-print option in such a way as

java -classpath ./classes coins.driver.Driver -S -coins:trace=HIR.1/LIR.1/Sym.1 sample.c

(See 2. How to use the Compiler Driver and 3. How to use C Compiler.)

1.2. Structure of the COINS Compiler Infrastructure

1.2.1. Overall Structure

The COINS compiler infrastructure is composed of 3 major parts. The front-end part translates the source program into HIR (High-level Intermediate Representation). The middle part converts HIR into LIR (Low-level Intermediate Representation) and also does optimization and parallelization transformations on HIR or LIR. The back-end part generates the assembler code of the target machine from LIR.

Additional management part contains miscellaneous functions such as compiler control, HIR management, and symbol management.

The overall structure of COINS is shown in Fig.1-1.


Fig.1-1 Overall Structure of COINS

The concrete representation of HIR is a tree. An HIR tree is an abstract representation of a source program. It reflects logical structure of the source program such as subprogram, block, statement, expression, compound variable, simple variable, constant, and so on. Every language constructs in HIR has a type such as int, float, vector, struct, union, and so on. HIR is designed so as to be able to represent programs of various languages such as C, Fortran, Pascal.

LIR is an abstract representation of machine language program. In LIR, operations are decomposed into elementary operations such as SET, JUMP, CALL, etc. with simple or compound operands. The operands may represent memory, register, or expression. Data type of LIR corresponds to data type handled in machines such as integer, float of some bit length.

LIR is a language that can represents not only operations but also the entire program providing features to describe a module (a collection of functions), memory area, and data. Semantics of LIR is defined rigorously according to the denotational semantics to avoid misunderstanding.

The compiler infrastructure is entirely written in Java.

1.2.2. Front-end characteristics

The front-end part translates source languages such as C and Fortran into HIR. HIR has been designed so that high level language concepts can be represented and optimizations and parallelizations can be done on that level in (almost) language independent and machine independent manner. It has been designed by abstracting currently used procedural languages.

Language specific parts are concentrated in the front-end part so that the middle part (optimizers, parallelizers, etc.) and the back-end part (code generators, machine level optimizers, etc.) can be built language independently.

A C language analyzer and a Fortran 77 analyzer have already been provided to generate HIR from C and Fortran, repectively. In order to make a compiler for other language, it is sufficient to build a language analyzer that translates the language into HIR, except for some minor adjustment for a few modules.

Examples of making new compilers are shown in Developing new compilers with COINS

In generating HIR, it is required to understand the overview of HIR. The HIR manager (HIR) and symbol table manager (Sym) are described in interface modules such as HIR0.java (plain HIR interface), HIR.java (full HIR interface), Sym0.java (plain Symbol table interface), Sym.java (full Symbol table interface).
Their implementation modules are HIR_Impl.java and SymImpl.java.
Detailed specifications of access methods are described in the interface modules so that it is not necessary to see implementation modules except for some special cases.

In order to make or modify the front-end part, it is necessary to read the interfaces

and then their sub-interfaces, if you cannot get sufficient information. Some important sub-interfaces to be read are

Usage of methods is usually described in upper interfaces so that it is not necessary to read lower interfaces. Methods are interrelated and there may be restrictions in invoking them. In upper interfaces, many upper methods are provided to make the use of access methods simple. You will misuse the access methods if you read lower interfaces or read implementation modules before reading upper interfaces.

1.2.3. Back-end characteristics

As it is mentioned, the back-end of COINS is language independfent and the same back-end can generate target machine code for various languages. In order to make a new code generator for some machine, it is necessary only to add a target machine description (TMD) and a subclass of coins.MachineParam for that machine. In the COINS Compiler Infrastructure, the new code generator is automatically generated based on the given TMD at the time of building COINS.

The COINS generates assembly language program for the machine refering LIR that is translated from HIR. The LIR can represent machine level operations in abstract form so that all back-end processing such as optimization, register allocation, and instruction scheduling can be written machine independently.

All modules in the back-end do LIR-to-LIR transformation except for the last one that transforms LIR to the assembly language of the target machine. The syntax of LIR is the same all through the processings so that the processings can be written machine independently. Some information attached to LIR may contain items specific to the target machine. Operands of LIR may be an LIR expression as well as a register or a memory location.

In the process of instruction selection for the target machine, compound LIR expressions are decomposed into simple LIR expressions each of which may correspond to some machine instruction of the target machine. The decomposed LIR and LIR expression allocated with physical register of the target machine can be treated machine independently because the syntax of LIR is the same in all cases.

The target machine description TMD represents the correspondence of LIR patterns and target machine instruction sequence in the form of LIR-to-LIR transformation. TMD also describes the structure of register set and calling sequence, etc.

1.2.4. Classes and Interfaces

There are several classes that contains global information and methods available to all over the compiler infrastructure. They are placed in "coins" package. Some of them are

Machine dependent parameters and methods are concentrated in

Source language dependent parameters and methods are concentrated in

Information requiring mutual exclusion between individual compilers in COINS such as compiler options are concentrated in

The package structure of the infrastructure is the following, where "#" means a language dependent package, and "*" means a machine dependent package.

coins        -- Top level package of the COINS Compiler Infrastructure
  #cfront    -- C parser converting C to AST (abstract syntax tree for C)
  #ast       -- Classes to generate AST for C
    #expr    -- AST expression generation
    #stmnt   -- AST statement generation
  #ffront    -- Fortran77 parser translating Fortran77 to HIR 
  driver     -- Compiler driver
  drivergen  -- Driver generator
  #casttohir -- C AST to HIR translator
  ir         -- Intermediate representation IR
    hir      -- High level intermediate representation HIR
  sym        -- Symbol table for HIR
  hir2lir    -- HIR to LIR converter 
  flow       -- Control flow and data flow analyzer for HIR
  aflow      -- Obsolete version of flow (used only in mdf)
  alias      -- Alias analyzer
  opt        -- HIR optimizer
  ssa        -- SSA form optimizer for LIR 
  lparallel  -- loop parallelizer
  mdf        -- Coarse grain parallelizing module (SMP parallelizer)
  simd       -- SIMD parallelizer
  snapshot   -- Make snap shot file (XML file) for the visualizer
  backend    -- Back-end
    ana      -- Control flow analyzer for LIR
    cfg      -- Control flow graph builder for LIR
    *gen     -- Code generator and target machine descriptions
    lir      -- LIR manager
    opt      -- Backend optimizer
    regalo   -- Register allocator
    sched    -- Instruction scheduler
    sym      -- LIR symbol manager
    *targets -- Machine parameter table interface
    sim      -- LIR simulator
    contrib  -- User contribution (modules invoked by attach option)
    asmpp    -- Assembler level preprocessor
    tools    -- Back-end tools
    util     -- Back-end utility
  hir2c      -- HIR to C translator
  lir2c      -- LIR to C translator

1.3. Construction of a Compiler based on the COINS Infrastructure

1.3.1. Compiler Control

The compiler driver described in coins.driver.Driver.java controls the execution of the COINS C compiler. It contains definitions of compiler options and invocation statements for compiler components. The procedure of component invocation is described in the method compile(....) in Driver.java. See 2. How to use the Compiler Driver for details.

To change the sequence of component invocations and to add, replace, delete some compiler components, Driver.java may be changed. It would be better to make a subclass of Driver.java and override some of Driver's methods appropriately. See, for example, Developing new compilers with COINS.

The infrastructure does not use static fields except for "static final" in order to make it possible to develop a compiler where its components can be executed in concurrent. All methods except for some ones in Root classes (classes such as IoRoot, SymRoot, HirRoot, LirRoot, etc.) are non-static methods and should be applied to instances.

The compiler driver instantiates IoRoot first and supplies source file, object file, print file, etc. (See getSourceFile(), printOut, objectFile, msgOut, etc. in IoRoot.java). All compiler components should convey the instance of IoRoot (ioRoot) directly or indirectly and make it protected or public so that it can be accessed directly or indirectly from methods in the component.

In more detail, objects of other Root classes include a reference to the instance of IoRoot in order to enable input/output operations. All Sym objects include a reference to SymRoot object, all HIR objects include a reference to HirRoot, and so on. In this way, almost all classes has a link to the IoRoot directly or indirectly in order to enable input/output operations in their methods. IoRoot has such methods as

  getSourceFile(), getSourceFilePath(), getCompileSpecification() 

to access files and compile specifications given by command line.

As the next step, the compiler driver instantiates SymRoot to make symbol information be shared between compiler components. All Sym objects such as symbol table and entries in the symbol table (variable, subprogram, constant, type, etc.) contain a reference to the SymRoot object so that methods of Sym class and IoRoot class can be invoked. The symbol tables are nested reflecting scope of symbols and organized into tree structure.

The root of the symbol tables is named as symTableRoot. The symbol table currently effective is called current symbol table and named as symTableCurrent. They are accessed from SymRoot object. Built-in symbols representing basic types, etc. are registered in symTableRoot and can be accessed from SymRoot object, hence they can be accessed from all methods under Sym and its subclasses.

The compiler driver instantiates HirRoot and then invokes some parser such as C parser that translates source program into HIR. The parser should convey the instance of HirRoot to its components so that they can access I/O files, symbol tables, and HIR information. The super class of HirRoot is IrRoot where the root of intermediate representation (IR) of input source program is recorded as programRoot. The IrRoot is also the super class of HirRoot. HIR representation of input program can be traced starting from programRoot.

The compiler driver may either parse all subprograms in a source file before code generation, or repeat parsing and code generation for each subprogram in the source file. In the former case, inter-procedural optimization and parallelization may be possible but consumes large memory space. In the later case, required memory space is relatively small but the possibility of inter-procedural optimization is limited.

Error messages and warning messages are issued by invoking put method of Message class in "coins" package. The number of messages issued is counted for each group of messages. Compiler implementers may prepare their own error handlers that invokes the put method in order to provide some information peculiar to each component such as source program line number. (See Message.java.)

It is often required to see the status of compiler for debugging. The method

   void print(int pLevel, String pAt, String pMessage) 

in the class Debug in coins package prints pAt and pMessage if pLevel is less or equal to the debug level specified by command line. Its usage is illustrated by

    hirRoot.ioRoot.dbgHir.print(4, "subpNode", pSubp.getName());

(See Debug.java.)

1.3.2. Getting Triggers of Intermediate Representation

In the process of parsing, the list of subprogram definitions will be constructed (by addSubpDefinition() of coins.ir.hir.Program called in the parser). Each subprogram definition can be get by using iterator in such way as

    coins.ir.IrList lSubpDefList
      = ((Program)hirRoot.programRoot).getSubpDefinitionList();
    Iterator lSubpDefIterator = lSubpDefList.iterator();
    while (lSubpDefIterator.hasNext()) {
      SubpDefinition lSubpDef = (SubpDefinition)(lSubpDefIterator.next());
      ....
    }

where, hirRoot refers to the HirRoot object.

The subprogram defined by the subprogram definition is get by

    Subp lSubp = lSubpDef.getSubpSym();

The symbol table local to the subprogram is get by

    SymTable lSymTable = lSubp.getSymTable();

or

    SymTable lSymTable = lSubpDef.getSymTable();

The procedural body of the subprogram is get by

    HIR lHirSubpBody = lSubp.getHirBody();

or

    HIR lHirSubpBody = lSubpDef.getHirBody();

(See IrLisr of coins.ir, SubpDefinition, HirIterator of coins.ir.hir, Subp of coins.sym)

Every HIR nodes of the subprogram lSubp can be traversed by using HirIterator in such a way as

    for (HirIterator lHirIterator 
           = hirRoot.hir.hirIterator(lSubp.getHirBody());
         lHirIterator.hasNext(); ) {
      HIR lNode = lHirIterator.next();
      ....
    }

All statements in the subprogram can be traversed by a coding sequence such as

    for (HirIterator lHirIterator 
          = hirRoot.hir.hirIterator(lSubp.getHirBody());
         lHirIterator.hasNextStmt(); ) {
      Stmt lStmt = lHirIterator.getNextStmt();
      ....
    }

Note that some node or statement may be null and it is better to do null-check before applying methods to them.

To catch node or statement of some particular class during the traversing procedure, such coding as

    if (lNode instanceof VarNode) { .... }
    if (lNode instanceof SubpNode) { .... }
    if (lStmt instanceof AssignStmt) { .... }

will be convenient. They may be also caught by such coding as

    if (lStmt.getOperator() == HIR.OP_VAR) { .... }

Another way of coding is to use HirVisitor in such a way as

    public class 
    ProcessHirNode extends coins.ir.hir.HirVisitorModel1
    {
      public final HirRoot
      hirRoot;

      public 
      ProcessSymNode( HirRoot pHirRoot )
      {
        super(pHirRoot);
        hirRoot = pHirRoot;
      }

      public void
      processSymNode( Subp pSubp )
      {
        hirRoot.symRoot.subpCurrent = pSubp;
        visit(pSubp.getHirBody());
      }

      protected void
      atVarNode( VarNode pVarNode )
      {
        ....
      }

      protected void
      atSubpNode( SubpNode pSubpNode )
      {
        ....
      }
      ....
    }

(See HIR, HirVisitor, HirVisitorModel1, HirVisitorModel2 in coins.ir.hir.)

To scan all symbols recorded in symbol tables, iterators are provided in SymTable interface. A coding sequence

    for (SymIterator lIterator = lSymTable.getSymIterator();
         lIterator.hasNext(); ) {
      Sym lSym = lIterator.next();
      .....
    }

traverses all symbols recorded in the symbol table lSymTable. If SymIterator is applied to symTableRoot, all global symbols in the given program unit are traversed.

Another coding sequence

    for (SymNestIterator lIterator = lSymTable.getSymIterator();
         lIterator.hasNext(); ) {
      Sym lSym = lIterator.next();
      .....
    }

traverses all symbols recorded in the symbol table lSymTable and its descendent symbol tables. If SymNestIterator is applied to symTableCurrentSubp, all symbols local to the current subprogram are traversed. If SymNestIterator is applied to symTableRoot, all symbols recorded in the given program unit except constants in symTableConst are traversed.

The next coding sequence

    for (SymTableIterator lTableIterator = lSymTable.getSymTableIterator();
         lTableIterator.hasNext(); ) {
      SymTable lSymTableCurr = lTableIterator.next();
      for (SymIterator lSymIterator = lSymTableCurr.getSymIterator();
         lSymIterator.hasNext(); ) {
        Sym lSym = lSymIterator.next();
        ......
      }
    }

will traverse all symbol tables under lSymTable and all symbols in the traversed symbol tables examining attributes of the traversed symbol tables.

1.3.3. Symbol Handling

1.3.3.1. Factory Methods

Construction of Sym and HIR are done by factory methods, that is, objects of Sym and HIR are not usually constructed by invoking constructors directly but by invoking factory methods. The factory methods of Sym are described in the Sym interface (Sym.java). Examples of the usage of factory methods are shown in SimpleMain.java.

1.3.3.2. Subprograms, Variables, Constants

A subprogram symbol can be constructed in such coding as

Subp lSubp = symRoot.sym.defineSubp("sub1".intern(), symRoot.sym.typeInt);

where, the first parameter specifies subprogram name and the second parameter specifies return value type. symRoot refers to SymRoot object. If symRoot is not accessible directly but hirRoot is accessible, replace symRoot in the above coding by hirRoot.symRoot. String parameters for Sym, HIR methods should have .intern() in order to make unique String object that can be compared by "==" operator instead of the "equals" method. All String objects returned by Sym, HIR methods are unique String object and need not to have .intern(). (See Sym0.java, Sym.java, HIR0.java, HIR.java.)

Similarly, a variable symbol can be constructed by

Var lVar = symRoot.defineVar("var1".intern(), symRoot.typeFloat);

Integer constant, long int constant can be made by

IntConst lIntConst1 = symRoot.sym.intConst(123, symRoot.typeInt);
IntConst lLongConst1= symRoot.sym.intConst(123, symRoot.typeLong);
IntConst lIntConst2 = symRoot.sym.intConst("123".intern(), symRoot.typeInt);

and floating constant can be made by

FloatConst lPai = symRoot.sym.floatConst(3.14, symRoot.typeFloat);
FloatConst lDoubleConst1 = symRoot.sym.floatConst(1.2, symRoot.typeDouble);

For mode detail, see the Sym interface (Sym0.java and Sym.java).

Care should be taken in making a string constant because the representation of character string differs by language. For example, a string constant in C has trailing "\0" and may contain preceding escape character for some special characters. A string constant is recorded as a pure string (processing escape characters by makeStringBody of coins.SourceLanguage) that is language independent. To make a string for C language from the pure string, makeCstring method is provided, for Java language, makeJavaString method is provided, and so on.

1.3.3.3. Symbol Table

All symbols are recorded in some symbol table. The interface of the symbol table is SymTable (SymTable.java). An instance of SymTable is created for each scope of symbols corresponding such language constructs as program, subprogram, struct, etc.

Several symbol tables are constructed according to the structure of given source program. At first, a global symbol table is created by initiate() of SymRoot and symbols inherent to the COINS infrastructure are recorded in it. The symbols inherent to the COINS compiler infrastructure are such ones as basic types and bool constants. Types of each source language are mapped to the corresponding types of the COINS compiler infrastructure in such way as

    C int            COINS int
    C array          COINS vector
    Fortran INTEGER  COINS int
    Fortran REAL     COINS float

When a new scope of symbols is opened, a new symbol table is to be created and linked to ancestor symbol table that contains symbols to be inherited by the new scope (pushSymTable()). When the current scope is closed, the current symbol table is to be closed by which the ancestor symbol table becomes the current symbol table again (by using popSymTable()).

Symbols are searched in the current symbol table (symTableCurrent of SymRoot) and its ancestors in the reverse order of scope addition. The methods pushSymTable and popSymTable changes symTableCurrent when they are called. Popped symbol table is not discarded unless it is empty but made invisible for search procedures so as to make inter-procedure optimization and parallelization can be done. A symbol table usually has corresponding program construct such as subprogram and it is called as the owner of the symbol table. There are links between such constructs (owner) and corresponding symbol table to show their correspondence (getOwner). Anonymous construct (anonymous Struct, BlockStmt, etc.) may have name generated by the compiler.

1.3.3.4. Scope of Symbols

Source program symbols (symbols appearing in source program) have their scope as defined by the grammar of the language. Each Struct and Union opens a new scope. Scope of constants is the entire compile unit. Scope of temporal variables generated by the compiler is the subprogram within which the temporal variables are defined.

Symbols may have indication of scope (extern, public, private, compile_unit, etc.) and variables may have indication of storage class (static, automatic, etc.). In storage allocation and symbol treatment in code generation, these indications and nesting of symbol tables should be properly treated. Care should be taken that one subprogram may have nested symbol tables. Nesting of subprograms is treated as the nesting of corresponding symbol tables.

1.3.3.5. SymRoot

SymRoot class is used to access Sym (symbol) information and information prepared by other classes such as IoRoot, HIR, etc. All Sym objects contain a reference to the SymRoot object from which symbol information and methods can be quickly accessed. The SymRoot object contains a reference to IoRoot. Thus, every Sym objects can access input/output methods, too. SymRoot contains SymTable references:

    symTableRoot    // Root of SymTable.
    symTableConst   // Constant table.
    symTableUnique  // SymTable that contains generated unique name.
    symTableCurrent // Referes to the symbol table for subprogram, 
                    // etc. under construction or under processing.
    symTableCurrentSubp // Symbol table of current subprogram. Some kinds of
                        // symbols (Type, Label, tmporal variable, etc.) are
                        // registered not in symTableCurrent
                        // but in symTableCurrentSubp.

The subprogram under construction or processing is recorded in subpCurrent of SymRoot.

In parsing, flow analysis, optimization, code generation, etc., it is strongly recommended to set SymTableCurrent, subpCurrent, symTableCurrentSubp as it is exemplified in SimpleMain.java.

They are used in searching/generating symbols. If new symbols are to be created in such processing, SymTableCurrent and subpCurrent should be set properly. Several methods such as sym/pushSymTable, sym/popSymTable, aflow/subpFlow keeps such consistency automatically as it is described in explanations of these methods.

pushSymTalbe/popSymTable methods should be used in parsers but should not be used in optimization, code generation, etc. because pushSymTable creates a new SymTable corresponding to a new scope in input source program.

SymRoot contains type symbols of base types such as typeBool, typeChar, typeInt, etc. as predefined symbols.

1.3.3.6. Type

Symbols such as variables, subprograms, constants have type. The type is represented by a type symbol. Types used in HIR are classified into base type and introduced type.

    Base type (type intrinsic to HIR):
      int        represented by typeInt   of SymRoot
      float      represented by typeFloat of SymRoot
      ....       (see SymRoot)
    Introduced type (type introduced by the input program):
      pointer type     represented by the class PointerType
      vector type      represented by the class VectorType
      structure type   represented by the class StructType
      union type       represented by the class UnionType 
      enumeration type represented by the class EnumType
      subprogram type  represented by the class SubpType
      defined type     represented by the class DefinedType 

A pointer type is defined by pointer indication (* in C) and the type of the target of the pointer.
A vector type is derived from element type by specifying the type of vector element and the number of elements in the vector.
A structure type is defined by specifying its elements that may represent object different with each other.
A union type is defined by specifying overlaid elements.
A subprogram type is defined by specifying type of parameters and the type of return value.
An enumeration type is defined by specifying enumeration literals representing some integer value.
Defined types may be a renaming of base type or a compound type that is derived from base type or defined type.

Type symbols are created by factory methods in Sym interface. The factory methods for type creation are baseType, pointerType, vectorType, structType, unionType, enumType, subpType and definedType.

The structure of SubpType, StructType, UnionType, and EnumType are a little complicated. It is not recommended to use subpType directly but it is recommended to use defineSubp of Sym interface that defines both subprogram symbol and subprogram type. For making type instance of StructType, UnionType, or EnumType, read carefully the explanation of the corresponding method structType, unionType, or enumType of Sym interface.

In order to define a subprogram symbol,

    make the subprogram symbol by defineSubp(...),
    add formal parameters by addParam(....),
    close the subprogram declaration by closeSubpHeader(....)

in such a way as

    Subp lSubp = symRoot.sym.defineSubp("name".intern(), returnType);
    SymTable lSubpSymTable = symRoot.symTableCurrent.pushSymTable(lSubp);
    lSubp.addParam(param1);
    lSubp.addParam(param2);
    ....
    lSubp.setOptionalParam(); // Not required if it has no optional parameter.
    lSubp.closeSubpHeader();
    Var lVar1 = lSubpSymTable.defineVar("a".intern(), symRoot.typeInt);
    ....
    symRoot.symTableCurrent.popSymTable();

Above procedure will make a subprogram object with inevitable fields such as parameter list, return value type, and subprogram type.
closeSubpHeader() will make subprogram type of the form

  <SUBP < paramType_1 paramType_2 ... > returnValueType
        optionalParam >

where, paramType_1, paramType_2, ... are parameter type, returnValueType is return value type, optionalParam is true or false depending on whether optional parameter ("..." in C) is specified or not.
pushSymTable(lSubp) makes new symbol table owned by the subprogram lSubp and makes it symTableCurrent. lSubpSymTable.defineVar( .... ) defines a variable as an element of lSubpSymTable. popSymTable() makes lSubpSymTable invisible form symbol serach procedure and makes the previous symbol table as symTableCurrent.

To make a structure type, structType method is provided in Sym interface. Users may understand how to use it by following example:
As for

       struct listNode {
         int nodeValue;
         struct listNode *next;
        } listAnchor, listNode1;

following coding will make corresponding StructType.

      Sym lTag = symRoot.symTableCurrent.generateTag("listNode".intern());
      StructType lListStruct = sym.structType(null, lTag); // Incomplete type.
      PointerType lListPtrType = sym.pointerType(lListStruct);
      PointerType lIntPtrType = sym.pointerType(symRoot.typeInt);
      symRoot.symTableCurrent.pushSymTable(lListStruct);
      Elem lValue = sym.defineElem("nodeValue".intern(), symRoot.typeInt);
      Elem lNext  = sym.defineElem("next".intern(), lListPtrType);
      lListStruct.addElem(lValue);
      lListStruct.addElem(lNext);
      lListStruct.finishStructType(true);
      symRoot.symTableCurrent.popSymTable();

Methods are provided to get information of introduced types:

    getSizeValue     of Sym interface
    getPointedValue  of Sym interface
    getElemCount     of VectorType interface
    getElemList      of StructType interface
    getParameterTypeList of SubpType interface
    ....
1.3.3.7. Generation of Temporal Variables and Labels

In compilers, temporal variables are often required to be generated for optimization, code generation, etc. A method

    public Var generateVar( Type pType );

is provided in the SymTable interface to generate a temporal variable in the symbol table local to the current subprogram (symTableCurrentSubp).

In order to generate labels, a method

    public Label generateLabel();

is provided in the SymTable interface. It generates a label in the symbol table local to the current subprogram (symTableCurrentSubp).

1.3.3.8. Representation of Symbol in Text Form

The method toString() gives the representation of a symbol in text form. It may be used for debug purpose, etc. toStringShort() shows short description and toStrindDetail() shows full description of the symbol.

1.3.4. HIR Handling

1.3.4.1. Getting Information of HIR

HIR can be instantiated and handled mostly by using methods in HIR0 (plain HIR interface). Simple compilers can be constructed by using methods in HIR0 and Sym0 interfaces. In constructing more complicated compiler, use methods in HIR and Sym. HIR inherits HIR0 and Sym inherits Sym0. In the following explanations, HIR may be read as HIR0 in most cases.

Most of HIR constructs have correspondence to some source language constructs, e.g.

    SubpDefinition - subprogram definition
    Stmt           - statement
    AssignStmt     - assign statement
    LoopStmt       - loop statement
    BlockStmt      - block statement
    Exp            - expression
    VarNode        - variable
    ConstNode      - constant

Subcomponent of HIR constructs can be get by methods provided in each HIR subclass (interface that extends HIR). For example,

    getLeftSide(), getRightSide() of AssignStmt
    getIfCondition(), getThenPart(), getElsePart() of IfStmt
    getLoopStartCondition(), getLoopBodyPart() of 
      ForStmt, WhileStmt, UntilStmt that extend LoopStmt
    getExp1(), getExp2() of Exp
    getSymNodeSym() of VarNode, ElemNode, ConstNode that extend SymNode

As for detail, see corresponding interfaces that extend HIR.

The subcomponents can be get also by specifying child number by getChild1(), getChild2(), and getChild(int pChildNumber). In such coding, exact knowledge of HIR data structure is required. getChildCount of IR interface gives the number of children of HIR nodes. (getChild1() and getChild2() have less overhead than getChild(1) and getChild(2).)

All HIR nodes have type attribute. It can be get by the method getType. Some HIR nodes may have flags set during parsing, analysis, etc. The method getFlag(int pFlagNumber) returns the status of the flag indicated by pFlagNumber.

1.3.4.2. Representation of HIR in Text Form

The method toString() gives the representation of an HIR node in text form. It may be used for debug purpose, etc. toStringShort() shows short description and toStrindDetail() shows full description of the node.

The method print(....) of HIR prints the subtree stemming from the specified node, that is, all subtrees of the specified node are printed recursively.

1.3.4.3. Normal Construction of HIR

HIR nodes can be constructed by methods defined in HIR interface. Leafs of HIR tree are symbol node, list, etc. In HIR, symbols are represented by symbol nodes having reference to some symbol table entry such as variable and subprogram.

A Symbol node can be generated by factory methods of HIR.

    VarNode lVarNode1     = hirRoot.hir.varNode(lVar1);
    SubpNode lSubpNode1   = hirRoot.hir.subpNode(lSubp);
    ConstNode lConstNode1 = hirRoot.hir.constNode(lIntConst1);

will instantiate VarNode, SubpNode, ConstNode, each respectively. hirRoot.hir, hirRoot.symRoot.sym, etc. may be shortened by local declarations

    HIR hir = hirRoot.hir;
    Sym sym = hirRoot.symRoot.sym;

Arithmetic expressions can be built by such coding as

    Exp lExp1 = hir.exp(HIR.OP_ADD, lVarNode1, lConstNode1);
    Exp lExp2 = hir.exp(HIR.OP_MULT, lExp1, hir.varNode(lVar2));

Assign-statement, if-statement, etc. are built by

    Stmt lAssign1 = hir.assignStmt(lVarNode1, lExp1);
    Stmt lAssign2 = hir.assignStmt(lVarNode1, lExp2);
    Stmt lIf1 = hir.ifStmt(hir.exp(HIR.OP_CMP_EQ, lExp1,
               hir.constNode(0, symRoot.typeInt)), lAssign1, lAssign2);

etc.

1.3.4.4. Top down Construction of HIR

HIR tree is usually constructed in bottom up manner starting from leafs and combining them as above. Top down construction is also possible by attaching a subtree (leaf node or nonleaf node) to parent tree as its child.

    setChild1( IR pChild1 ), setChild2( IR pChild2 ),
    setChild( int pChildNumber, IR pChild )

of IR interface are available for such construction. The top down construction requires knowledge of detailed structure of HIR tree. Recommended way is bottom up construction by using the prepared factory methods.

In some cases, strict bottom up construction is difficult. For example, in the construction of block statement and subprogram definition, most of their children are not known at first. Several methods are provided to construct such subtrees. They are explained in the next section.

1.3.4.5. Construction by Sequence of Statements

Block statement can be constructed by a statement sequence such as

    BlockStmt lBlockStmt = hir.blockStmt(null);
    lBlockStmt.addLastStmt(lAssign1);
    lBlockStmt.addLastStmt(lIf1);
    ....

(See HIR, BlockStmt in coins.ir.hir.)

Subprogram can be constructed by such statement sequence as

    Subp lMain = symRoot.sym.defineSubp("main".intern(), symRoot.typeInt);
    SymTable lSymTable = symRoot.symTableRoot.pushSymTable(lMain);
    lMain.closeSubpHeader();
    SubpDefinition lMainDef = hir.subpDefinition(lMain, lSymTable);
    BlockStmt lBlockStmt = hir.blockStmt(null);
    lMainDef.setHirBody(lBlockStmt);
    ....
    lBlockStmt.addLastStmt(lAssign1);
    ....
    symRoot.symTableCurrent.popSymTable();

(In case of prototype declaration, use closeSubpPrototype instead of closeSubpHeader. See HIR, SubpDefinition, Subp, SimpleMain.java, etc.)

IrList, HirList can be constructed by such statement sequence as

    HirList lList = hir.irList();
    lList.add(....);
    ....

(See HIR in coins.ir.hir, IrList in coins.ir.)

1.3.4.6. Note on HIR Construction and Transformation

An example of HIR generation is shown by SimpleMain.java. Readers can see how to construct symbol table and HIR tree of a program. It may be useful in coding new parser.

It is possible to build Sym objects and HIR objects by invoking constructors of VarImpl, SubpImpl, VarNodeImpl, ConstNodeImpl, AssignStmtImpl, IfStmtImpl, etc. but such coding is not recommended. Such coding may cause many errors because there are some hidden parameters supplied by factory methods and there are some preparatory methods to be applied to parameters.

It should be noted that the structure of HIR is tree. Every nodes in the HIR tree should be created newly and should not be shared because sharing of nodes violates the data structure rule of tree. If a subtree same as some subtree X is required, X should be copied by the method

    X.copyWithOperands()

if X is an expression or

    X.copyWithOperandsChangingLabels()

if X is a statement that may include label definitions. The HIR method "isTree" checks whether the rule of tree structure is violated or not. The method "isTree" is invoked automatically if debug-print level of HIR is greater than 1 in Driver.java.

When entire HIR subtree of a subprogram is constructed, finishHir() should be called in such way as

    lSubpDefinition.finishHir();

where lSubpDefinition represents SubpDefinition node of the subprogram. The method finishHir() does such operations as giving index number to HIR nodes under the subtree, checking tree structure conformance, certificating getHirPosition() for labels, and so on. When the HIR subtree of a subprogram was changed in optimization and parallelization, then finishHir() should also be called for the subtree. This method is not required to call for each modification of statements ans expressions of SubpDefinition but at the end of creation or transformation of the entire SubpDefinition subtree.

1.3.4.7. Indispensable Items and Optional Items

Syntactic structure of HIR is shown in HIR0.java and in HIR.java as comment lines. Nonterminals that do not derive to null are indispensable items. Nonterminals that derive to null are optional items. There may be some exceptional nonterminal that derives to null but represents indispensable item. Such case may happen in avoiding BNF productions to become too much verbose but such case can be clearly discriminated by considering semantics.

1.4. Test Program Suite

1.4.1. Overview

Under the directory "test" there are many small programs written to test the COINS compiler infrastructure. The directory "test/c" contains C test program suite composed of more than 1000 small C programs which are available to test C front-end and basic features of the COINS compiler infrastructure. The C test programs are grouped under many sub-directories such as Assign, Exp, If, Loop, etc. (See "README.c.txt", "README.c-result-may-differ.txt", "README.c-not-for-auto-test.txt" under the "test" directory.)

The C test program suite also contains expected execution results corresponding to each test program so that automatic test is possible by compiling and executing the test programs and comparing the result of execution for them. There are test programs that are not suitable for execution test, for example, programs to test syntax error detection, programs to test implementation dependent features, and so on. Such test programs are placed under the sub-directories named "test/c/unsupported" and "test/c-result-may-differ".

When some component of the COINS compiler infrastructure is modified or added, it is recommended to test whether it is correctly executed or not by using the test suite.

1.4.2. Automatic test

1.4.2.1. On Sparc machine

The test script

    test/c/testdriver.sh

is available to do automatic test for the C test program suite. It can automatically compile and execute the programs in the test program suite and compare the results with the corresponding expected results. Following is an example of command sequence to do such automatic test.

    cd ./classes   -- Type at the root of the COINS compiler infrastructure.
    ../test/c/testdriver.sh -v ../test/c/*/*.c
    ../test/c/testdriver.sh -v ../test/c/*/*/*.c

The first and the second invocations of "testdriver.sh" will do test by using test programs under the first level and the second level sub-directories, respectively. The result of the test will be shown by files which will be located under

    testdriver-yymmdd-hhmmss

that is a directory created during the test.

If "Arg list too long" message is issued in your environment by the command

    ../test/c/testdriver.sh -v ../test/c/*/*.c

then make a temporal directory such as "test/c2" and move several sub-directories such as TestFnami2, WhiteBox1, WhiteBox2 (that contain many files) to the temporal directory and execute

    ../test/c/testdriver.sh -v ../test/c2/*/*.c

as additional command.

There is a command line

    if ! java coins.driver.Driver -coins:${TARGET}preprocessor=\
      'cpp -I../lang/c/include' -S -o $CCCOUT $i &>$CCCLOG

in "testdriver.sh". You can change compile option by changing this line in such way as

    if ! java coins.driver.Driver -coins:${TARGET},hirOpt=cse,preprocessor=\
       'cpp -I../lang/c/include' -S -o $CCCOUT $i &>$CCCLOG

You can do automatic test by using test programs under some sub-directory only. For example, if you want to do test by using test programs under Loop sub-directory only, give following command:

    ../test/c/testdriver.sh -v ../test/c/Loop/*.c

Some test programs under sub-directories "test/c/unsupported" and "test/c-result-may-differ" will cause error but it may be normal because each of them may be a test program to cause error. Result of these test programs should be examined individually.

Some test programs may require further investigation even if "OK" sign is issued for them. For example, "test/c/Opt", "test/c/OptAdd" and "test/c/TestSsa" contain test programs for optimization and it is required to examine object codes generated.

1.4.2.2. On Intel x86 machine and others

If test is to be done on x86-cygwin, use "test/c/testdriverw.sh" instead of "testdriver.sh". Its usage is the same as the "testdriver.sh" but it is required to prepare expected results on x86 machine as explained in the next section.

Note.
Target machine can be changed by command option in such a way as

    ../test/c/testdriver.sh -v -t x86-cygwin ../test/c/Loop/*.c

but this command option is not yet reliable and in some environment, this may not correctly work. It is better to use the test script

   test/c/testdriverw.sh

for x86-cygwin.

As for other machine, please adjust the "testdriver.sh" to the machine.

1.4.3. Preparation of expected results

1.4.3.1. On Sparc machine

Currently, the expected results of the test programs are prepared by using gcc compiler on Sparc machine. You can renew the expected results by the following command sequence:

    cd ./classes
    ../test/c/testprepare.sh ../test/c/*/*.c
    ../test/c/testprepare.sh ../test/c/*/*/*.c

When you added or modified test programs, you should also renew the expected results of the test programs added or modified.

1.4.3.2. On Intel x86 machine and others

If you want to do test on a machine other than Sparc, you should renew the expected results by using gcc compiler or some other existing compiler on the machine.

If test is to be done on x86-cygwin, use "test/c/testpreparew.sh" instead of "testprepare.sh". Its usage is the same as the "testdprepare.sh".

As for other machine, please adjust the "testprepare.sh" to the machine.

1.5. Visualization Tools

1.5.1. CoVis: Visualiztion tools for HIR/LIR

CoVis is the visualization tools for source programs, HIRs, LIRs, and control flow graphs. An example is shown in Fig.1-2.


Fig.1-2 A view of CoVis

In the figure, the control flow graph (upper left), the HIR representation (upper right), and the source program (lower right) of the selected function are shown. The selected basic block is colored red, and the corresponding parts of the HIR and the source are colored blue. The detailed information of the selected basic block is shown in the lower left.

The intermediate representation can be selected by clicking one of the four buttons: Generated HIR (generated by the front end), Optimized HIR (by HIR optimization), Generated LIR (generated by HIR-to-LIR translation), and Optimized LIR (by SSA optimization).

(In using Java 1.7 and Java 1.8, there may be a case where graphs are irregularly overlaid. If you find such cases, please use Java 1.5 or Java 1.6 until the CoVis is revised.)
1.5.1.1. How to install CoVis
  1. Get the following packages:
  2. Expand covis.gz (or CoVis.tgz) and grappa-built.jar (or grappa.tgz) and get the following: (The file build.xml is prepared to rebuild CoVis when some source files are changed. Initially, CoVis is already built when it is downloaded.)
  3. Make a directory for visualization (say CoVis) and set class files as follows: The class files att and java_cup are prepared in grappa-built/classes when grappa-built.jar is expanded by such command as
       jar -xvf grappa-built.jar
    In case of expanding grappa.tgz, make an empty directory named classes on the same level as src, then compile the source files on the directory grappa/src by such command as
       javac -d ../classes -Xlint:unchecked jdk1.2/*/*/*.java
    then following class files will be generated. There is another way using grappa1_2.jar instead of grappa class files as it is explained later.
  4. Install the graphviz and see the existence of the executable file
       graphvis/bin/dot
    Set a path to the directory graphviz/bin/
    so that a command named dot can be executed.
1.5.1.2. How to use CoVis
  1. Genrate the XML file

    By the coins compiler option

       -coins:debuginfo,snapshot
    

    the coins internal representation is translated to the XML file. For example, the command

      java -cp ./classes coins.driver.Driver -coins:debuginfo,snapshot -S source.c
    

    will generate the XML file "source.xml" from "source.c". (The option debuginfo is optional.)

  2. View the XML file

    If you get the Exception "java.lang.UnsupportedClassVersionError" when you open the XML file, recompile the CoVis and grappa source files.

1.5.2. Trace of the code generation process

The trace of the code generation process can be viewed by a browser. An example is shown in Fig.1-3.


Fig.1-3 A view of code generation process

The left column of the figure shows the phase when the traced data obtained. By clicking "[<<]" or "[>>]", the previous or the next information of the same basic block can be viewed.

1.5.2.1. How to use this tool

The traced data can be generated by the compiler option:

    -coins:debuginfo,trace=xxx

and the data can be formatted as a html file by the perl script trace2html.pl.

    java coins.driver.Driver -coins:debuginfo,... > trace.java
    perl trace2html.pl -o trace -c  trace.java

The trace options are: