There are 6 TMD files in the package coins.backend.gen as shown in the following Table10-1.
Architecture | Number of lines | Man months | File name | |
---|---|---|---|---|
Total | in Java | |||
SPARC | 1,949 | 802 | unavailable | sparc.tmd |
X86 | 2,391 | 985 | unavailable | x86.tmd |
ARM | 3,052 | 2,176 | 6 | arm.tmd |
MIPS | 2,081 | 1,379 | 3 | mips.tmd |
SH4 | 3,568 | 2,467 | 6 | sh4.tmd |
PowerPC | 5,016 | 2,913 | 6 | ppc.tmd |
(def *type-address* I32) (def *type-bool* I32)the type of address is defined as I32, and also the type of bool (the result type of TSTxx instruction) is defined as I32.
(def *real-reg-symtab* (SYMTAB ("%g0" REG I32 4 0) : ("%g7" REG I32 4 0) ("%l0" REG I32 4 0) : ("%l7" REG I32 4 0) ("%o0" REG I32 4 0) : ("%o6" REG I32 4 0) ("%i0" REG I32 4 0) : ("%i6" REG I32 4 0) ("%f0" REG F64 8 0) : ("%sp" REG I32 4 0) ("%fp" REG I32 4 0)))The defined information is inserted in the global symbol table of the L-module.
In sparc.tmd, this is defined by using the "foreach" macro as follows:
(def *real-reg-symtab*
(SYMTAB
(foreach @n (01 23 45 67)
("%l@n" REG I64 4 0))
(foreach @p (i o)
(foreach @n (01 23 45)
("%@p@n" REG I64 4 0)))
(foreach @gl (g l)
(foreach @n (0 1 2 3 4 5 6 7)
("%@gl@n" REG I32 4 0)))
(foreach @oi (o i)
(foreach @n (0 1 2 3 4 5)
("%@oi@n" REG I32 4 0)))
(foreach @n (0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30)
("%f@n" REG F64 8 0))
("%sp" REG I32 4 0)
("%fp" REG I32 4 0)))
The name of each register must begin with the character "%".
(def *reg-I32* ( (REG I32 "%i0") (REG I32 "%i1") (REG I32 "%i2") : (REG I32 "%i6") (REG I32 "%o0") : (REG I32 "%o6") (REG I32 "%l0") : (REG I32 "%l7")))
(defregset regl *reg-I32*) (defregset regh *reg-I16*) (defregset regb *reg-I8*)The first argument of the defregset-construct is a nonterminal symbol and the second argument is a set of registers for the nonterminal. For example, when the reduction to the nonterminal regl occurs, a register in *reg-I32* is assigned to it.
(defregsetvar (type registers-set))defines that a register of registers-set is assigned to a register variable of type type by default. The following is an example.
(defregsetvar (I32 *reg-I32*) (I16 *reg-I16*) (I8 *reg-I8*) (F64 *reg-F64*) (F32 *reg-F32*))
(defstart void)In this example, void is the start symbol.
For example, sparc.tmd includes the following set of rules:
1: (defregset regl *reg-I32*) ;; Default register set for each nonterminals. 2: (defrule xregl (REG I32)) ;; nonterminal for registers of the form (REG I32 ..) 3: (defrule regl xregl) ;; nonterminal for registers 4: (defrule addr regl) ;; nonterminal for addressing mode 5: (defrule rc regl) ;; register or small integer constant 6: (defrule sta (STATIC I32)) ;; nonterminal for address of static variable 7: (defrule asmcon sta) ;; nonterminal for integer constant 8: (defrule regl asmcon (code (_set $1 $0)) (cost 2)) 9: (defrule regl (ADD I32 regl rc) (code (add $1 $2 $0)) (cost 1))) 10: (defrule regl (MEM I32 addr) (code (ld (mem $1) $0)) (cost 1)))where the leftmost digits are the number inserted for the following explanation. The first argument of a defrule-construct is the left-hand nonterminal of a production rule. The second argument is a L-expression, and is the right-hand side of the rule.
When the input L-expression matches to the right-hand side of a rule, the target code in the associated code-attribute is generated.
The L-expression generated from the expression "x+y" of the following
source program:
int x;
int func(int y){
return x+y;
}
is
(SET I32 (REG I32 "returnvalue.2%0")
(ADD I32 (MEM I32 (STATIC I32 "x")) (REG I32 "y.1%0")))
The result of the matching to this input L-expression is:
*195: regl -> (ADD I32 regl rc) [dest=(REG I32 "returnvalue.2%0")] SU=1 ;; 9 *60: regl -> (MEM I32 addr) [dest=(REG I32 ".T1%")] SU=1 ;; 10 19: addr -> regl SU=0 ;; 4 *44: regl -> asmcon [dest=(REG I32 ".T2%")] SU=1 ;; 8 29: asmcon -> sta SU=0 ;; 7 27: sta -> (STATIC I32) SU=0 ;; 6 35: rc -> regl SU=0 ;; 5 *15: regl -> xregl [dest=(REG I32 "y.1%0")] SU=0 ;; 3 5: xregl -> (REG I32) SU=0 ;; 2The leftmost digits are the rule number in the CodeGenerator_sparc.java generated from sparc.tmd. The rules with "*" in the figure are used for generating target code. The rightmost digits are the rule number in the previous figure. "SU" is the Sethi-Ullman number.
The result of this matching is expressed still in L-expression as follows:
(SET I32 (REG I32 ".T2%") (STATIC I32 "x"))
(SET I32 (REG I32 ".T1%") (MEM I32 (REG I32 ".T2%")))
(SET I32 (REG I32 "returnvalue.2%0") (ADD I32 (REG I32 ".T1%") (REG I32 "y.1%0")))
After the register allocation phase these are transformed to:
(SET I32 (REG I32 "%i1") (STATIC I32 "x"))
(SET I32 (REG I32 "%i1") (MEM I32 (REG I32 "%i1")))
(SET I32 (REG I32 "%i0") (ADD I32 (REG I32 "%i1") (REG I32 "%i0")))
Final code generation is done again by tree matching. The result is:
(sethi (%hi x) %i1)
(or %i1 (%lo x) %i1)
(ld (mem %i1) %i1)
(add %i1 %i0 %i0)
where the first two instructions are generated by
(code (_set $1 $0))
_set is the name of the macro-instruction defined as follows:
%defbuild(_set x y) {
return ImList.list
(ImList.list("sethi", ImList.list("%hi", x), y),
ImList.list("or", y, ImList.list("%lo", x), y));
}
The final assembly code is:
sethi %hi(x),%i1
or %i1,%lo(x),%i1
ld [%i1],%i1
add %i1,%i0,%i0
(foreach @x (a b) (foo @x)) => (foo a) (foo b) (foreach (@x @y) ((a 1) (b 2)) (foo @x @y)) => (foo a 1) (foo b 2)The previous rule 9:
9: (defrule regl (ADD I32 regl rc) (code (add $1 $2 $0)) (cost 1)))is a part of the expanded rules from the following macro expression:
(foreach (@op @code) ((ADD add) (SUB sub) (BAND and) (BOR or) (BXOR xor)) (defrule regl (@op I32 regl rc) (code (@code $1 $2 $0)) (cost 1)))And the previous rule 10
10: (defrule regl (MEM I32 addr) (code (ld (mem $1) $0)) (cost 1)))is a part of the expanded rules from the following macro expression:
(foreach (@t @code @s) ((I32 ld l) (I16 ldsh h) (I8 ldsb b) (F32 ld f)) (defrule reg@s (MEM @t addr) (code (@code (mem $1) $0)) (cost 1)))
(defrule production-rule cond-attr code-attr value-attr regset-attr eqreg-attr cost-attr clobber-attr )Each item is explained in the following:
The form of production rule is:
production-rule ::= lhs pattern lhs ::= nonterm pattern ::= S-expression
where lhs is a nonterminal symbol to which the right-hand pattern is reduced.
pattern is a nonterminal symbol or an L-expression, where some parts of the L-expression are possibly replaced by nonterminals.
cond-attr specifies the condition whether the production rule is applicable or not.
cond-attr ::= (cond condition) condition ::= quoted-string
condition is a boolean expression in Java. $0 symbol in the expression is replaced by the target L-expression.
For example, the rule
(defrule smallnumber (INTCONST I32)) (cond "((LirIconst)$0).value >= 0 && ((LirIconst)$0).value < 256"))
matches to the integer constant between 0 and 255.
The class LirIconst is in the package coins.backend.lir and is used to represent a (INTCONST I32 ..) object.code-attr is the sequence of the instructions of the target CPU.
code-attr ::= (code code-sequence...) code-sequence ::= S-expression
codesequence is a sequence of assembly instructions in the form of S-expression.
value-attr specifies the value assigned to the left-hand nonterminal.
value-attr ::= (value value-sequence...) value-sequence ::= S-expression
For example, if the target CPU has the addressing mode of the form [%r0+%r1], the rules:
(defrule address (ADD I32 reg reg) (value (plus $1 $2))) (defrule mem (MEM I32 address) (value (ind $1))) (defrule reg mem (code (load $1 $0)))
will generate from the input L-expression:
(MEM I32 (ADD I32 (REG I32 "x") (REG I32 "y")))
the following instruction:
(load (ind (plus %r0 %r1)) %r2)
This instruction will be transformed to the following final assembly code, if the ind macro and the plus macro are defined appropriately.
load [%r0+%r1],%r2
If value-attr is omitted, the value of left-hand nonterminal is:
cost-attr specifies the cost of the target instruction.
cost-attr ::= (cost number)
number is the cost of the target instruction. If the cost is the size (number of bytes) of the instruction, space-efficient code will be generated. If the cost is the latency of the instruction, time-efficient code will be generated.
regset-attr specifies the set of registers to which the nonterminal belongs.
regset-attr ::= (regset ($n regset-name) ...)
The following example indicates that the register assigned to $1 (dividend) is %eax and the register assigned to $2 (divisor) is one of %eax, %ebx, %ecx, %esi, %edi.
(def *reg-mod$2-I32* ( (REG I32 "%eax") (REG I32 "%ecx") (REG I32 "%ebx") (REG I32 "%esi") (REG I32 "%edi") )) (defrule regl (DIVS I32 regl regl) (eqreg $1 $0) (regset ($0 *reg-eax-I32*) ($2 *reg-mod$2-I32*) ...)
eqreg-attr indicates that the two nontermnals must be assigned the same register.
eqreg-attr ::= (eqreg ($n $0) ...)
$n means the n-th nonterminal in the right-hand side. The n-th nonterminal must be defined by defregset-construct.
clobber-attr indicates the set of registers destroyed by this instruction.
clobber-attr ::= (clobber realregister...) realregister ::= (REG type quoted-string)
The registers indicated by clobber-attr are inserted into the resultant L-expression in the form of CLOBBER-expression.
For example, the first L-expressions generated from the previous example source program:
int x;
int func(int y){
return x+y;
}
are
(PROLOGUE (0 0) (MEM I32 (FRAME I32 "y.1")))
(DEFLABEL "_lab1")
(SET I32 (MEM I32 (FRAME I32 "returnvalue.2"))
(ADD I32 (MEM I32 (STATIC I32 "x")) (MEM I32 (FRAME I32 "y.1"))))
(JUMP (LABEL I32 "_epilogue"))
(DEFLABEL "_epilogue")
(EPILOGUE (0 0) (MEM I32 (FRAME I32 "returnvalue.2")))
If the target CPU is SPARC, the PROLOGUE-expression must be
transformed to the following L-expressions according to the calling convention of SPARC.
(PROLOGUE (0 0) (REG I32 "%i0"))
(SET I32 (REG I32 "y.1%0") (REG I32 "%i0"))
This transformation, or rewriting, is specified in sparc.tmd as follows:
(defrewrite (PROLOGUE)
(to (norescan (eval "rewritePrologue($0, post)")))
(phase late))
The form:
(defrewrite pattern (to new-pattern))
specifies that pattern is replaced by new-pattern.
The above example specifies that the PROLOGUE-expression is replaced by the
result of the execution of the Java code rewritePrologue($0, post)
(see next section for Java code).
The resultant L-expression again becomes the target of the matching by default. "norescan" stops the default rescanning.
"(phase late)" means that this rewriting is applied in the phase "LateRewriting" of the backend process and "(phase early)" means the phase "EarlyRewriting". See 11. Structure and Extensions of the Backend Process for these phase names.
pre/post means the instruction sequence immediately before/after this expression.
The above example shows that
(PROLOGUE (0 0) (MEM I32 (FRAME I32 "y.1")))
is replaced by
(PROLOGUE (0 0) (REG I32 "%i0"))
and
(SET I32 (REG I32 "y.1%0") (REG I32 "%i0"))
is added to the post sequence.
Also,
(EPILOGUE (0 0) (MEM I32 (FRAME I32 "returnvalue.2")))
is transformed by
(defrewrite (EPILOGUE)
(to (norescan (eval "rewriteEpilogue($0, pre)")))
(phase late))
to
(SET I32 (REG I32 "%i0") (REG I32 "returnvalue.2%0"))
(EPILOGUE (0 0) (REG I32 "%i0"))
grammar definitions %% imports %State methods Method of class State %CodeGenerator methods Method of class CodeGenerator
grammar definition consists of the list of defrewrite-, def-, defregset-, defregsetvar-, defstart-, defrule-construct.
Remaining parts are written in Java. imports are the list of import statements.
Methods of class State are the list of methods mainly referred in cond-attr.
Methods of class CodeGenerator are the list of methods in the class CodeGenerator_target. defbuild/defemit macro definitions are also placed here.
For example, this method is defined in sparc.tmd as follows:
LirNode rewriteFrame(LirNode node) {
Symbol fp = func.module.globalSymtab.get("%fp");
int off = ((SymAuto)((LirSymRef)node).symbol).offset();
return lir.node
(Op.ADD, node.type, lir.symRef(fp), lir.iconst(I32, (long)off));
}
The func field of the super class CodeGenerator indicates the currently
prosecced L-function (instance of the class coins.backend.Function).
func.module is the currently processed L-module and has the global symbol table.
"%fp" is the frame-pointer register.
The lir field of CodeGenerator has a instance of the class LirFactory. Various LIR nodes (LirNode) are generated by calling appropriate methods of lir.
%defbuild(_set x y) { return ImList.list (ImList.list("sethi", ImList.list("%hi", x), y), ImList.list("or", y, ImList.list("%lo", x), y)); }The class ImList is in the package coins.backend.util. ImList.list is a class method of ImList. The result of this macro expansion is the list of assembly instructions in the list form.
The followings are two example of the %defemit macro in sparc.tmd.
%defemit(+ x y) { if (y.charAt(0) == '-') return x + y; else return x + "+" + y; } %defemit(mem x) { return "[" + x + "]"; }The result of %defemit macro expansion is an instance of the class String.
INTCONST -> LirIconst FLOATCONST -> LirFconst STATIC, FRAME, REG, LABEL -> LirSymRef unary op. -> LirUnaOp binary op. -> LirBinOp other -> LirNaryOp
(mov (mem x) %r1) (mov (mem y) %r0) (add %r0 %r1 %r0) (mov %r0 (mem x))The list form assembly code is finally transtated to the target assembly code as follows:
mov [x],%r1 mov [y],%r0 add %r0,%r1,%r0 mov %r0,[x]