1. How to use COINS compiler infrastructure

Contents

1.1. Getting Overview of the Compiler Infrastructure

The COINS compiler infrastructure is composed of At present, the compiler infrastructure contains two parsers, for C and Fortran77, and 6 code generators, for SPARC, x86, x86_64, ARM, MIPS, SH-4, PowerPC, Alpha, MicroBlaze, and Thumb. Thus, it has 2*10 = 20 compilers.

Compilation with debug-print option will show how the compiling process proceeds and how a source program is transformed into intermediate representation (HIR and LIR). It is the shortest way of understanding how the compiler infrastructure works and understanding HIR, LIR, and symbol table that are used as interfaces between compiler components.

Please try to compile several short programs specifying debug-print option in such way as

java -classpath ./classes coins.driver.Driver -S -coins:trace=HIR.1/LIR.1/Sym.1 sample.c
(See 2. How to use the Compiler Driver and 3. How to use C Compiler.)

1.2. Structure of the COINS Compiler Infrastructure

1.2.1. Overall Structure

The COINS compiler infrastructure is composed of 3 major parts.
The front-end part translates the source program into HIR (High-level Intermediate Representation).
The middle part converts HIR into LIR (Low-level Intermediate Representation) and also does optimization and parallelization transformations on HIR or LIR.
The back-end part generates the assembler code of the target machine from LIR.
Additional management part contains miscellaneous functions such as compiler control, HIR management, and symbol management.
The overall structure of COINS is shown in Fig.1-1.

Fig.1-1 Overall Structure of COINS

The concrete representation of HIR is a tree. An HIR tree is an abstract representation of a source program. It reflects logical structure of the source program such as subprogram, block, statement, expression, compound variable, simple variable, constant, and so on. Every language constructs in HIR has a type such as int, float, vector, struct, union, and so on. HIR is designed so as to be able to represent programs of various languages such as C, Fortran, Pascal.

LIR is an abstract representation of machine language program. In LIR, operations are decomposed into elementary operations such as SET, JUMP, CALL, etc. with simple or compound operands. The operands may represent memory, register, or expression. Data type of LIR corresponds to data type handled in machines such as integer, float of some bit length.

LIR is a language that can represents not only operations but also the entire program providing features to describe a module (a collection of functions), memory area, and data. Semantics of LIR is defined rigorously according to the denotational semantics to avoid misunderstanding.

1.2.2. Classes and Interfaces

The compiler infrastructure is entirely written in Java.
The intermediate language managers (HIR), symbol table manager (Sym), and flow analyzer (Flow) are described in interface modules such as HIR0.java (plain HIR interface), HIR.java (full HIR interface), Sym0.java (plain Symbol table interface), Sym.java (full Symbol table interface), Flow.java.
Their implementation modules are HIR_Impl.java, SymImpl.java, and FlowImpl.java.
Detailed specifications of access methods are described in the interface modules so that it is not necessary to see implementation modules except for some special cases.

In order to make a compiler based on the COINS infrastructure or to modify a compiler based on the infrastructure, it is necessary to read

and then their sub-interfaces, if you cannot get sufficient information. Some important sub-interfaces to be read are Machine dependent parameters and methods are concentrated in Source language dependent parameters and methods are concentrated in Information requiring mutual exclusion between individual compilers are concentrated in

Usage of methods is usually described in upper interfaces so that it is not necessary to read lower interfaces. Methods are interrelated and there may be restrictions in invoking them. In upper interfaces, many upper methods are provided to make the use of access methods simple. You will misuse the access methods if you read lower interfaces or read implementation modules before reading upper interfaces.

There are several classes that contains global information and methods available to all over the compiler infrastructure. They are placed in "coins" package. Some of them are

The package structure of the infrastructure is the following, where "*" means language dependent packages, and "**" means machine dependent packages.
    *cfront    -- C parser converting C to AST (abstract syntax tree for C)
    *ast       -- Classes to generate AST for C
      *expr    -- AST expression generation
      *stmnt   -- AST statement generation
    *ffront    -- Fortran77 parser translating Fortran77 to HIR 
    coins      -- Infrastructure of the COINS compiler
      driver     -- Compiler driver
      *casttohir -- C AST to HIR translator
      ir         -- Intermediate representation IR
        hir      -- High level intermediate representation HIR
        lir      -- Obsolete 
      sym        -- Symbol table
      hir2lir    -- HIR to old LIR converter 
      aflow      -- Control flow and data flow analyzer for HIR
      flow       -- Obsolete
      alias      -- Alias analyzer
      opt        -- Basic optimizer
      ssa        -- SSA form optimizer for LIR 
      lparallel  -- loop parallelizer
      mdf        -- Coarse grain parallelizing module (SMP parallelizer)
      simd       -- SIMD parallelizer
      snapshot   -- Make snap shot file (XML file) for the visualizer
      back-end   -- Back-end
        ana      -- Control flow analyzer for LIR
        cfg      -- Control flow graph builder for LIR
        **gen    -- Code generator and target machine descriptions
                    other than SPARC and x86
        lir      -- LIR manager
        opt      -- Backend optimizer
        regalo   -- Register allocator
        sched    -- Instruction scheduler
        sym      -- LIR symbol manager
        **targets -- Machine parameter table interface
        **tmd    -- Target machine description (SPARC and x86)
        util     -- Backend utility
      hir2c      -- HIR to C translator
      lir2c      -- LIR to C translator

1.3. Construction of a Compiler based on the COINS Infrastructure

1.3.1. Compiler Control

The compiler driver described in coins.driver.Driver.java controls the execution of the COINS C compiler. It contains definitions of compiler options and invocation statements for compiler components. The procedure of component invocation is described in the method compile(....) in Driver.java. See 2. How to use the Compiler Driver for details.

To change the sequence of component invocations and to add, replace, delete some compiler components, Driver.java may be changed. It would be better to make a subclass of Driver.java and override some of Driver's methods appropriately. See, for example, http://www.coins-project.org/international/advanceduse/compilerDevelopE.html.

The infrastructure does not use static fields except for "static final" in order to make it possible to develop a compiler where its components can be executed in concurrent. All methods except for some ones in Root classes (classes such as IoRoot, SymRoot, HirRoot, LirRoot, etc.) are non-static methods and should be applied to instances.

The compiler driver instantiates IoRoot first and supplies source file, object file, print file, etc. (See getSourceFile(), printOut, objectFile, msgOut, etc. in IoRoot.java). All compiler components should convey the instance of IoRoot (ioRoot) directly or indirectly and make it protected or public so that it can be accessed directly or indirectly from methods in the component.

In more detail, objects of other Root classes include a reference to the instance of IoRoot in order to enable input/output operations. All Sym objects include a reference to SymRoot object, all HIR objects include a reference to HirRoot, and so on. In this way, almost all classes has a link to the IoRoot directly or indirectly in order to enable input/output operations in their methods. IoRoot has such methods as

  getSourceFile(), getSourceFilePath(), getCompileSpecification() 
to access files and compile specifications given by command line.

As the next step, the compiler driver instantiates SymRoot to make symbol information be shared between compiler components. All Sym objects such as symbol table and entries in the symbol table (variable, subprogram, constant, type, etc.) contain a reference to the SymRoot object so that methods of Sym class and IoRoot class can be invoked. The symbol tables are nested reflecting scope of symbols and organized into tree structure.

The root of the symbol tables is named as symTableRoot. The symbol table currently effective is called current symbol table and named as symTableCurrent. They are accessed from SymRoot object. Built-in symbols representing basic types, etc. are registered in symTableRoot and can be accessed from SymRoot object, hence they can be accessed from all methods under Sym and its subclasses.

The compiler driver instantiates HirRoot and then invokes some parser such as C parser that translates source program into HIR. The parser should convey the instance of HirRoot to its components so that they can access I/O files, symbol tables, and HIR information. The super class of HirRoot is IrRoot where the root of intermediate representation (IR) of input source program is recorded as programRoot. The IrRoot is also the super class of HirRoot. HIR representation of input program can be traced starting from programRoot.

The compiler driver may either parse all subprograms in a source file before code generation, or repeat parsing and code generation for each subprogram in the source file. In the former case, inter-procedural optimization and parallelization may be possible but consumes large memory space. In the later case, required memory space is relatively small but the possibility of inter-procedural optimization is limited.

Error messages and warning messages are issued by invoking put method of Message class in "coins" package. The number of messages issued is counted for each group of messages. Compiler implementers may prepare their own error handlers that invokes the put method in order to provide some information peculiar to each component such as source program line number. (See Message.java.)

It is often required to see the status of compiler for debugging. The method

   void print(int pLevel, String pAt, String pMessage) 
in the class Debug in coins package prints pAt and pMessage if pLevel is less or equal to the debug level specified by command line. Its usage is illustrated by
    hirRoot.ioRoot.dbgHir.print(4, "subpNode", pSubp.getName());
(See Debug.java.)

1.3.2. Getting Triggers of Intermediate Representation

In the process of parsing, the list of subprogram definitions will be constructed (by addSubpDefinition() of coins.ir.hir.Program called in the parser). Each subprogram definition can be get by using iterator in such way as
    coins.ir.IrList lSubpDefList
      = ((Program)hirRoot.programRoot).getSubpDefinitionList();
    Iterator lSubpDefIterator = lSubpDefList.iterator();
    while (lSubpDefIterator.hasNext()) {
      SubpDefinition lSubpDef = (SubpDefinition)(lSubpDefIterator.next());
      ....
    }
where, hirRoot refers to the HirRoot object.
The subprogram defined by the subprogram definition is get by
    Subp lSubp = lSubpDef.getSubpSym();
The symbol table local to the subprogram is get by
    SymTable lSymTable = lSubp.getSymTable();
or
    SymTable lSymTable = lSubpDef.getSymTable();
The procedural body of the subprogram is get by
    HIR lHirSubpBody = lSubp.getHirBody();
or
    HIR lHirSubpBody = lSubpDef.getHirBody();
(See IrLisr of coins.ir, SubpDefinition, HirIterator of coins.ir.hir, Subp of coins.sym)

Every HIR nodes of the subprogram lSubp can be traversed by using HirIterator in such a way as

    for (HirIterator lHirIterator 
           = hirRoot.hir.hirIterator(lSubp.getHirBody());
         lHirIterator.hasNext(); ) {
      HIR lNode = lHirIterator.next();
      ....
    }
All statements in the subprogram can be traversed by a coding sequence such as
    for (HirIterator lHirIterator 
          = hirRoot.hir.hirIterator(lSubp.getHirBody());
         lHirIterator.hasNextStmt(); ) {
      Stmt lStmt = lHirIterator.getNextStmt();
      ....
    }
Note that some node or statement may be null and it is better to do null-check before applying methods to them.

To catch node or statement of some particular class during the traversing procedure, such coding as

    if (lNode instanceof VarNode) { .... }
    if (lNode instanceof SubpNode) { .... }
    if (lStmt instanceof AssignStmt) { .... }
will be convenient. They may be also caught by such coding as
    if (lStmt.getOperator() == HIR.OP_VAR) { .... }
Another way of coding is to use HirVisitor in such a way as
    public class 
    ProcessHirNode extends coins.ir.hir.HirVisitorModel1
    {
      public final HirRoot
      hirRoot;

      public 
      ProcessSymNode( HirRoot pHirRoot )
      {
        super(pHirRoot);
        hirRoot = pHirRoot;
      }

      public void
      processSymNode( Subp pSubp )
      {
        hirRoot.symRoot.subpCurrent = pSubp;
        visit(pSubp.getHirBody());
      }

      protected void
      atVarNode( VarNode pVarNode )
      {
        ....
      }

      protected void
      atSubpNode( SubpNode pSubpNode )
      {
        ....
      }
      ....
    }
(See HIR, HirVisitor, HirVisitorModel1, HirVisitorModel2 in coins.ir.hir.)

To scan all symbols recorded in symbol tables, iterators are provided in SymTable interface. A coding sequence

    for (SymIterator lIterator = lSymTable.getSymIterator();
         lIterator.hasNext(); ) {
      Sym lSym = lIterator.next();
      .....
    }
traverses all symbols recorded in the symbol table lSymTable. If SymIterator is applied to symTableRoot, all global symbols in the given program unit are traversed.

Another coding sequence

    for (SymNestIterator lIterator = lSymTable.getSymIterator();
         lIterator.hasNext(); ) {
      Sym lSym = lIterator.next();
      .....
    }
traverses all symbols recorded in the symbol table lSymTable and its descendent symbol tables. If SymNestIterator is applied to symTableCurrentSubp, all symbols local to the current subprogram are traversed. If SymNestIterator is applied to symTableRoot, all symbols recorded in the given program unit except constants in symTableConst are traversed.

The next coding sequence

    for (SymTableIterator lTableIterator = lSymTable.getSymTableIterator();
         lTableIterator.hasNext(); ) {
      SymTable lSymTableCurr = lTableIterator.next();
      for (SymIterator lSymIterator = lSymTableCurr.getSymIterator();
         lSymIterator.hasNext(); ) {
        Sym lSym = lSymIterator.next();
        ......
      }
    }
will traverse all symbol tables under lSymTable and all symbols in the traversed symbol tables examining attributes of the traversed symbol tables.

1.3.3. Symbol Handling

1.3.3.1. Factory Methods
Construction of Sym and HIR are done by factory methods, that is, objects of Sym and HIR are not usually constructed by invoking constructors directly but by invoking factory methods. The factory methods of Sym are described in the Sym interface (Sym.java). Examples of the usage of factory methods are shown in SimpleMain.java.
1.3.3.2. Subprograms, Variables, Constants
A subprogram symbol can be constructed in such coding as
Subp lSubp = symRoot.sym.defineSubp("sub1".intern(), symRoot.sym.typeInt);
where, the first parameter specifies subprogram name and the second parameter specifies return value type. symRoot refers to SymRoot object. If symRoot is not accessible directly but hirRoot is accessible, replace symRoot in the above coding by hirRoot.symRoot. String parameters for Sym, HIR methods should have .intern() in order to make unique String object that can be compared by "==" operator instead of the "equals" method. All String objects returned by Sym, HIR methods are unique String object and need not to have .intern(). (See Sym0.java, Sym.java, HIR0.java, HIR.java.)

Similarly, a variable symbol can be constructed by

Var lVar = symRoot.defineVar("var1".intern(), symRoot.typeFloat);
Integer constant, long int constant can be made by
IntConst lIntConst1 = symRoot.sym.intConst(123, symRoot.typeInt);
IntConst lLongConst1= symRoot.sym.intConst(123, symRoot.typeLong);
IntConst lIntConst2 = symRoot.sym.intConst("123".intern(), symRoot.typeInt);
and floating constant can be made by
FloatConst lPai = symRoot.sym.floatConst(3.14, symRoot.typeFloat);
FloatConst lDoubleConst1 = symRoot.sym.floatConst(1.2, symRoot.typeDouble);
For mode detail, see the Sym interface (Sym0.java and Sym.java).

Care should be taken in making a string constant because the representation of character string differs by language. For example, a string constant in C has trailing "\0" and may contain preceding escape character for some special characters. A string constant is recorded as a pure string (processing escape characters by makeStringBody of coins.SourceLanguage) that is language independent. To make a string for C language from the pure string, makeCstring method is provided, for Java language, makeJavaString method is provided, and so on.

1.3.3.3. Symbol Table
All symbols are recorded in some symbol table. The interface of the symbol table is SymTable (SymTable.java). An instance of SymTable is created for each scope of symbols corresponding such language constructs as program, subprogram, struct, etc.

Several symbol tables are constructed according to the structure of given source program. At first, a global symbol table is created by initiate() of SymRoot and symbols inherent to the COINS infrastructure are recorded in it. The symbols inherent to the COINS compiler infrastructure are such ones as basic types and bool constants. Types of each source language are mapped to the corresponding types of the COINS compiler infrastructure in such way as

    C int            COINS int
    C array          COINS vector
    Fortran INTEGER  COINS int
    Fortran REAL     COINS float
When a new scope of symbols is opened, a new symbol table is to be created and linked to ancestor symbol table that contains symbols to be inherited by the new scope (pushSymTable()). When the current scope is closed, the current symbol table is to be closed by which the ancestor symbol table becomes the current symbol table again (by using popSymTable()).

Symbols are searched in the current symbol table (symTableCurrent of SymRoot) and its ancestors in the reverse order of scope addition. The methods pushSymTable and popSymTable changes symTableCurrent when they are called. Popped symbol table is not discarded unless it is empty but made invisible for search procedures so as to make inter-procedure optimization and parallelization can be done. A symbol table usually has corresponding program construct such as subprogram and it is called as the owner of the symbol table. There are links between such constructs (owner) and corresponding symbol table to show their correspondence (getOwner). Anonymous construct (anonymous Struct, BlockStmt, etc.) may have name generated by the compiler.

1.3.3.4. Scope of Symbols
Source program symbols (symbols appearing in source program) have their scope as defined by the grammar of the language. Each Struct and Union opens a new scope. Scope of constants is the entire compile unit. Scope of temporal variables generated by the compiler is the subprogram within which the temporal variables are defined.

Symbols may have indication of scope (extern, public, private, compile_unit, etc.) and variables may have indication of storage class (static, automatic, etc.). In storage allocation and symbol treatment in code generation, these indications and nesting of symbol tables should be properly treated. Care should be taken that one subprogram may have nested symbol tables. Nesting of subprograms is treated as the nesting of corresponding symbol tables.

1.3.3.5. SymRoot
SymRoot class is used to access Sym (symbol) information and information prepared by other classes such as IoRoot, HIR, etc. All Sym objects contain a reference to the SymRoot object from which symbol information and methods can be quickly accessed. The SymRoot object contains a reference to IoRoot. Thus, every Sym objects can access input/output methods, too. SymRoot contains SymTable references:
    symTableRoot    // Root of SymTable.
    symTableConst   // Constant table.
    symTableUnique  // SymTable that contains generated unique name.
    symTableCurrent // Referes to the symbol table for subprogram, 
                    // etc. under construction or under processing.
    symTableCurrentSubp // Symbol table of current subprogram. Some kinds of
                        // symbols (Type, Label, tmporal variable, etc.) are
                        // registered not in symTableCurrent
                        // but in symTableCurrentSubp.
The subprogram under construction or processing is recorded in subpCurrent of SymRoot.

In parsing, flow analysis, optimization, code generation, etc., it is strongly recommended to set SymTableCurrent, subpCurrent, symTableCurrentSubp as it is exemplified in SimpleMain.java.

They are used in searching/generating symbols. If new symbols are to be created in such processing, SymTableCurrent and subpCurrent should be set properly. Several methods such as sym/pushSymTable, sym/popSymTable, aflow/subpFlow keeps such consistency automatically as it is described in explanations of these methods.

pushSymTalbe/popSymTable methods should be used in parsers but should not be used in optimization, code generation, etc. because pushSymTable creates a new SymTable corresponding to a new scope in input source program.

SymRoot contains type symbols of base types such as typeBool, typeChar, typeInt, etc. as predefined symbols.

1.3.3.6. Type
Symbols such as variables, subprograms, constants have type. The type is represented by a type symbol. Types used in HIR are classified into base type and introduced type.
    Base type (type intrinsic to HIR):
      int        represented by typeInt   of SymRoot
      float      represented by typeFloat of SymRoot
      ....       (see SymRoot)
    Introduced type (type introduced by the input program):
      pointer type     represented by the class PointerType
      vector type      represented by the class VectorType
      structure type   represented by the class StructType
      union type       represented by the class UnionType 
      enumeration type represented by the class EnumType
      subprogram type  represented by the class SubpType
      defined type     represented by the class DefinedType 
A pointer type is defined by pointer indication (* in C) and the type of the target of the pointer.
A vector type is derived from element type by specifying the type of vector element and the number of elements in the vector.
A structure type is defined by specifying its elements that may represent object different with each other.
A union type is defined by specifying overlaid elements.
A subprogram type is defined by specifying type of parameters and the type of return value.
An enumeration type is defined by specifying enumeration literals representing some integer value.
Defined types may be a renaming of base type or a compound type that is derived from base type or defined type.

Type symbols are created by factory methods in Sym interface. The factory methods for type creation are baseType, pointerType, vectorType, structType, unionType, enumType, subpType and definedType.

The structure of SubpType, StructType, UnionType, and EnumType are a little complicated. It is not recommended to use subpType directly but it is recommended to use defineSubp of Sym interface that defines both subprogram symbol and subprogram type. For making type instance of StructType, UnionType, or EnumType, read carefully the explanation of the corresponding method structType, unionType, or enumType of Sym interface.

In order to define a subprogram symbol,

    make the subprogram symbol by defineSubp(...),
    add formal parameters by addParam(....),
    close the subprogram declaration by closeSubpHeader(....)
in such a way as
    Subp lSubp = symRoot.sym.defineSubp("name".intern(), returnType);
    SymTable lSubpSymTable = symRoot.symTableCurrent.pushSymTable(lSubp);
    lSubp.addParam(param1);
    lSubp.addParam(param2);
    ....
    lSubp.setOptionalParam(); // Not required if it has no optional parameter.
    lSubp.closeSubpHeader();
    Var lVar1 = lSubpSymTable.defineVar("a".intern(), symRoot.typeInt);
    ....
    symRoot.symTableCurrent.popSymTable();
Above procedure will make a subprogram object with inevitable fields such as parameter list, return value type, and subprogram type.
closeSubpHeader() will make subprogram type of the form
  <SUBP < paramType_1 paramType_2 ... > returnValueType
        optionalParam >
where, paramType_1, paramType_2, ... are parameter type, returnValueType is return value type, optionalParam is true or false depending on whether optional parameter ("..." in C) is specified or not.
pushSymTable(lSubp) makes new symbol table owned by the subprogram lSubp and makes it symTableCurrent. lSubpSymTable.defineVar( .... ) defines a variable as an element of lSubpSymTable. popSymTable() makes lSubpSymTable invisible form symbol serach procedure and makes the previous symbol table as symTableCurrent.

To make a structure type, structType method is provided in Sym interface. Users may understand how to use it by following example: As for

       struct listNode {
         int nodeValue;
         struct listNode *next;
        } listAnchor, listNode1;
following coding will make corresponding StructType.
      Sym lTag = symRoot.symTableCurrent.generateTag("listNode".intern());
      StructType lListStruct = sym.structType(null, lTag); // Incomplete type.
      PointerType lListPtrType = sym.pointerType(lListStruct);
      PointerType lIntPtrType = sym.pointerType(symRoot.typeInt);
      symRoot.symTableCurrent.pushSymTable(lListStruct);
      Elem lValue = sym.defineElem("nodeValue".intern(), symRoot.typeInt);
      Elem lNext  = sym.defineElem("next".intern(), lListPtrType);
      lListStruct.addElem(lValue);
      lListStruct.addElem(lNext);
      lListStruct.finishStructType(true);
      symRoot.symTableCurrent.popSymTable();
Methods are provided to get information of introduced types:
    getSizeValue     of Sym interface
    getPointedValue  of Sym interface
    getElemCount     of VectorType interface
    getElemList      of StructType interface
    getParameterTypeList of SubpType interface
    ....
1.3.3.7. Generation of Temporal Variables and Labels
In compilers, temporal variables are often required to be generated for optimization, code generation, etc. A method
    public Var generateVar( Type pType );
is provided in the SymTable interface to generate a temporal variable in the symbol table local to the current subprogram (symTableCurrentSubp).

In order to generate labels, a method

    public Label generateLabel();
is provided in the SymTable interface. It generates a label in the symbol table local to the current subprogram (symTableCurrentSubp).

1.3.3.8. Representation of Symbol in Text Form
The method toString() gives the representation of a symbol in text form. It may be used for debug purpose, etc. toStringShort() shows short description and toStrindDetail() shows full description of the symbol.

1.3.4. HIR Handling

1.3.4.1. Getting Information of HIR
HIR can be instantiated and handled mostly by using methods in HIR0 (plain HIR interface). Simple compilers can be constructed by using methods in HIR0 and Sym0 interfaces. In constructing more complicated compiler, use methods in HIR and Sym. HIR inherits HIR0 and Sym inherits Sym0. In the following explanations, HIR may be read as HIR0 in most cases.

Most of HIR constructs have correspondence to some source language constructs, e.g.

    SubpDefinition - subprogram definition
    Stmt           - statement
    AssignStmt     - assign statement
    LoopStmt       - loop statement
    BlockStmt      - block statement
    Exp            - expression
    VarNode        - variable
    ConstNode      - constant
Subcomponent of HIR constructs can be get by methods provided in each HIR subclass (interface that extends HIR). For example,
    getLeftSide(), getRightSide() of AssignStmt
    getIfCondition(), getThenPart(), getElsePart() of IfStmt
    getLoopStartCondition(), getLoopBodyPart() of 
      ForStmt, WhileStmt, UntilStmt that extend LoopStmt
    getExp1(), getExp2() of Exp
    getSymNodeSym() of VarNode, ElemNode, ConstNode that extend SymNode
As for detail, see corresponding interfaces that extend HIR.

The subcomponents can be get also by specifying child number by getChild1(), getChild2(), and getChild(int pChildNumber). In such coding, exact knowledge of HIR data structure is required. getChildCount of IR interface gives the number of children of HIR nodes. (getChild1() and getChild2() have less overhead than getChild(1) and getChild(2).)

All HIR nodes have type attribute. It can be get by the method getType. Some HIR nodes may have flags set during parsing, analysis, etc. The method getFlag(int pFlagNumber) returns the status of the flag indicated by pFlagNumber.

1.3.4.2. Representation of HIR in Text Form
The method toString() gives the representation of an HIR node in text form. It may be used for debug purpose, etc. toStringShort() shows short description and toStrindDetail() shows full description of the node.

The method print(....) of HIR prints the subtree stemming from the specified node, that is, all subtrees of the specified node are printed recursively.

1.3.4.3. Normal Construction of HIR
HIR nodes can be constructed by methods defined in HIR interface. Leafs of HIR tree are symbol node, list, etc. In HIR, symbols are represented by symbol nodes having reference to some symbol table entry such as variable and subprogram.

A Symbol node can be generated by factory methods of HIR.

    VarNode lVarNode1     = hirRoot.hir.varNode(lVar1);
    SubpNode lSubpNode1   = hirRoot.hir.subpNode(lSubp);
    ConstNode lConstNode1 = hirRoot.hir.constNode(lIntConst1);
will instantiate VarNode, SubpNode, ConstNode, each respectively. hirRoot.hir, hirRoot.symRoot.sym, etc. may be shortened by local declarations
    HIR hir = hirRoot.hir;
    Sym sym = hirRoot.symRoot.sym;
Arithmetic expressions can be built by such coding as
    Exp lExp1 = hir.exp(HIR.OP_ADD, lVarNode1, lConstNode1);
    Exp lExp2 = hir.exp(HIR.OP_MULT, lExp1, hir.varNode(lVar2));
Assign-statement, if-statement, etc. are built by
    Stmt lAssign1 = hir.assignStmt(lVarNode1, lExp1);
    Stmt lAssign2 = hir.assignStmt(lVarNode1, lExp2);
    Stmt lIf1 = hir.ifStmt(hir.exp(HIR.OP_CMP_EQ, lExp1,
               hir.constNode(0, symRoot.typeInt)), lAssign1, lAssign2);
etc.

1.3.4.4. Top down Construction of HIR
HIR tree is usually constructed in bottom up manner starting from leafs and combining them as above. Top down construction is also possible by attaching a subtree (leaf node or nonleaf node) to parent tree as its child.
    setChild1( IR pChild1 ), setChild2( IR pChild2 ),
    setChild( int pChildNumber, IR pChild )
of IR interface are available for such construction. The top down construction requires knowledge of detailed structure of HIR tree. Recommended way is bottom up construction by using the prepared factory methods.

In some cases, strict bottom up construction is difficult. For example, in the construction of block statement and subprogram definition, most of their children are not known at first. Several methods are provided to construct such subtrees. They are explained in the next section.

1.3.4.5. Construction by Sequence of Statements
Block statement can be constructed by a statement sequence such as
    BlockStmt lBlockStmt = hir.blockStmt(null);
    lBlockStmt.addLastStmt(lAssign1);
    lBlockStmt.addLastStmt(lIf1);
    ....
(See HIR, BlockStmt in coins.ir.hir.)

Subprogram can be constructed by such statement sequence as

    Subp lMain = symRoot.sym.defineSubp("main".intern(), symRoot.typeInt);
    SymTable lSymTable = symRoot.symTableRoot.pushSymTable(lMain);
    lMain.closeSubpHeader();
    SubpDefinition lMainDef = hir.subpDefinition(lMain, lSymTable);
    BlockStmt lBlockStmt = hir.blockStmt(null);
    lMainDef.setHirBody(lBlockStmt);
    ....
    lBlockStmt.addLastStmt(lAssign1);
    ....
    symRoot.symTableCurrent.popSymTable();
(In case of prototype declaration, use closeSubpPrototype instead of closeSubpHeader. See HIR, SubpDefinition, Subp, SimpleMain, etc.)

IrList, HirList can be constructed by such statement sequence as

    HirList lList = hir.irList();
    lList.add(....);
    ....
(See HIR in coins.ir.hir, IrList in coins.ir.)

1.3.4.6. Note on HIR Construction and Transformation
An example of HIR generation is shown by SimpleMain.java. Readers can see how to construct symbol table and HIR tree of a program. It may be useful in coding new parser.

It is possible to build Sym objects and HIR objects by invoking constructors of VarImpl, SubpImpl, VarNodeImpl, ConstNodeImpl, AssignStmtImpl, IfStmtImpl, etc. but such coding is not recommended. Such coding may cause many errors because there are some hidden parameters supplied by factory methods and there are some preparatory methods to be applied to parameters.

It should be noted that the structure of HIR is tree. Every nodes in the HIR tree should be created newly and should not be shared because sharing of nodes violates the data structure rule of tree. If a subtree same as some subtree X is required, X should be copied by the method

    X.copyWithOperands()
if X is an expression or
    X.copyWithOperandsChangingLabels()
if X is a statement that may include label definitions. The HIR method "isTree" checks whether the rule of tree structure is violated or not. The method "isTree" is invoked automatically if debug-print level of HIR is greater than 1 in Driver.java.

When entire HIR subtree of a subprogram is constructed, finishHir() should be called in such way as

    lSubpDefinition.finishHir();
where lSubpDefinition represents SubpDefinition node of the subprogram. The method finishHir() does such operations as giving index number to HIR nodes under the subtree, checking tree structure conformance, certificating getHirPosition() for labels, and so on. When the HIR subtree of a subprogram was changed in optimization and parallelization, then finishHir() should also be called for the subtree. This method is not required to call for each modification of statements ans expressions of SubpDefinition but at the end of creation or transformation of the entire SubpDefinition subtree.

1.3.4.7. Indispensable Items and Optional Items
Syntactic structure of HIR is shown in HIR0.java and in HIR.java as comment lines. Nonterminals that do not derive to null are indispensable items. Nonterminals that derive to null are optional items. There may be some exceptional nonterminal that derives to null but represents indispensable item. Such case may happen in avoiding BNF productions to become too much verbose but such case can be clearly discriminated by considering semantics.

1.4. Test Program Suite

1.4.1. Overview

Under the directory "test" there are many small programs written to test the COINS compiler infrastructure. The directory "test/c" contains C test program suite composed of more than 1000 small C programs which are available to test C front-end and basic features of the COINS compiler infrastructure. The C test programs are grouped under many sub-directories such as Assign, Exp, If, Loop, etc. (See "README.c.txt", "README.c-result-may-differ.txt", "README.c-not-for-auto-test.txt" under the "test" directory.)
The C test program suite also contains expected execution results corresponding to each test program so that automatic test is possible by compiling and executing the test programs and comparing the result of execution for them. There are test programs that are not suitable for execution test, for example, programs to test syntax error detection, programs to test implementation dependent features, and so on. Such test programs are placed under the sub-directories named "test/c/unsupported" and "test/c-result-may-differ".
When some component of the COINS compiler infrastructure is modified or added, it is recommended to test whether it is correctly executed or not by using the test suite.

1.4.2. Automatic test

1.4.2.1. On Sparc machine
The test script
    test/c/testdriver.sh
is available to do automatic test for the C test program suite. It can automatically compile and execute the programs in the test program suite and compare the results with the corresponding expected results. Following is an example of command sequence to do such automatic test.
    cd ./classes   -- Type at the root of the COINS compiler infrastructure.
    ../test/c/testdriver.sh -v ../test/c/*/*.c
    ../test/c/testdriver.sh -v ../test/c/*/*/*.c
The first and the second invocations of "testdriver.sh" will do test by using test programs under the first level and the second level sub-directories, respectively. The result of the test will be shown by files which will be located under
    testdriver-yymmdd-hhmmss
that is a directory created during the test.
If "Arg list too long" message is issued in your environment by the command
    ../test/c/testdriver.sh -v ../test/c/*/*.c
then make a temporal directory such as "test/c2" and move several sub-directories such as TestFnami2, WhiteBox1, WhiteBox2 (that contain many files) to the temporal directory and execute
    ../test/c/testdriver.sh -v ../test/c2/*/*.c
as additional command.
There is a command line
    if ! java coins.driver.Driver -coins:${TARGET}preprocessor=\
      'cpp -I../lang/c/include' -S -o $CCCOUT $i &>$CCCLOG
in "testdriver.sh". You can change compile option by changing this line in such way as
    if ! java coins.driver.Driver -coins:${TARGET},hirOpt=cse,preprocessor=\
       'cpp -I../lang/c/include' -S -o $CCCOUT $i &>$CCCLOG
You can do automatic test by using test programs under some sub-directory only. For example, if you want to do test by using test programs under Loop sub-directory only, give following command:
    ../test/c/testdriver.sh -v ../test/c/Loop/*.c
Some test programs under sub-directories "test/c/unsupported" and "test/c-result-may-differ" will cause error but it may be normal because each of them may be a test program to cause error. Result of these test programs should be examined individually.
Some test programs may require further investigation even if "OK" sign is issued for them. For example, "test/c/Opt", "test/c/OptAdd" and "test/c/TestSsa" contain test programs for optimization and it is required to examine object codes generated.
1.4.2.2. On Intel x86 machine and others
If test is to be done on x86-cygwin, use "test/c/testdriverw.sh" instead of "testdriver.sh". Its usage is the same as the "testdriver.sh" but it is required to prepare expected results on x86 machine as explained in the next section.

Note.
Target machine can be changed by command option in such way as
    ../test/c/testdriver.sh -v -t x86-cygwin ../test/c/Loop/*.c
but this command option is not yet reliable and in some environment, this may not correctly work. It is better to use the test script
   test/c/testdriverw.sh
for x86-cygwin.
As for other machine, please adjust the "testdriver.sh" to the machine.

1.4.3. Preparation of expected results

1.4.3.1. On Sparc machine
Currently, the expected results of the test programs are prepared by using gcc compiler on Sparc machine. You can renew the expected results by the following command sequence:
    cd ./classes
    ../test/c/testprepare.sh ../test/c/*/*.c
    ../test/c/testprepare.sh ../test/c/*/*/*.c
When you added or modified test programs, you should also renew the expected results of the test programs added or modified.
1.4.3.2. On Intel x86 machine and others
If you want to do test on a machine other than Sparc, you should renew the expected results by using gcc compiler or some other existing compiler on the machine.
If test is to be done on x86-cygwin, use "test/c/testpreparew.sh" instead of "testprepare.sh". Its usage is the same as the "testdprepare.sh".
As for other machine, please adjust the "testprepare.sh" to the machine.