00 /* pass 7 -- Code generation */ 01 Root root = new Root(spec, new PrintWriter(System.out, true)); 02 String targetName 03 = coinsOptions.getArg(CommandLine.COINS_TARGET_NAME_OPTION); 04 String targetConvention 05 = coinsOptions.getArg(CommandLine.COINS_TARGET_CONVENTION_OPTION); 06 trace.trace(myName, 5000, "target name = " + targetName); 07 Module unit = Module.loadSLir(sexp, targetName, targetConvention, root); 08 makeCSourceFromLir("new", lirToCTimings, unit, io); 09 10 if(spec.getCoinsOptions().isSet("snapshot")) 11 snap.shot(unit,"Generated LIR"); 12 13 /* SSA optimization */ 14 if (spec.getCoinsOptions().isSet(coins.ssa.OptionName.SSA_OPT)) { 15 unit.apply(new coins.ssa.SsaDriver(unit, io, spec)); 16 /* Simple/JumpOpt again */ 17 unit.apply(coins.backend.opt.SimpleOpt.trig); 18 unit.apply(coins.backend.opt.JumpOpt.trig); 19 } else { 20 unit.basicOptimization(); 21 } 22 if (spec.getCoinsOptions().isSet("simd")) { 23 unit.apply(new coins.simd.SimdDriver(unit, io, spec)); 24 } 25 makeCSourceFromLir("opt", lirToCTimings, unit, io); 26 27 if(spec.getCoinsOptions().isSet("snapshot")) 28 snap.shot(unit,"Optimized LIR"); 29 30 if(spec.getCoinsOptions().isSet("snapshot")) 31 snap.generateXml(); 32 33 unit.generateCode(out); 34 35 if (trace.shouldTrace("Sym", 1)) { 36 trace.trace("Sym", 1, "\nSym after code generation "); 37 symRoot.symTable.printSymTableAllDetail(symRoot.symTableRoot); 38 symRoot.symTableConst.printSymTableDetail(); 39 }The instance root made at the line 1 has the root information of the backend process. It has the information of the compiler options (spec) and output assembly file (System.out). The backend processes are called at the line 7, 20, and 33 as:
Module unit = Module.loadSLir(sexp, targetName, targetConvention, root); unit.basicOptimization(); unit.generateCode(out);The instance unit has the internal representaion of the LIR made from the S-expression sexp.
The above three methods, loadSLir, basicOptimizations, and generateCode, are composed of several steps as follows:
Module unit = new Module(sexp, targetName, convention, root); unit.apply(new String[] { "IntroVirReg", "EarlyRewriting", "+AfterEarlyRewriting", "If2Jumpc" }); return unit;
unit.apply(new String[] { "+BeforeBasicOpt", "SimpleOpt", "JumpOpt", "PreHeaders", "LoopInversion", "+AfterBasicOpt" });
unit.apply(new String[] { "+BeforeCodeGeneration", "LateRewriting", "+AfterLateRewriting", "ToMachineCode", "+AfterToMachineCode", "ConvToAsm" });
unit.apply("IntroVirReg");the method doIt in the class coins.backend.opt.IntroVirReg is executed, because the Module constructor has registered the transformer IntroVirReg.trig with the name "IntroVirReg" to the instance root by the following statement:
root.registerTransformer(IntroVirReg.trig);By the execution of doIt of coins.backend.opt.IntroVirReg, frame variables are named as virtual registers.
Furthermore, the execution of
unit.apply("ToMachineCode");in generateCode causes the excecution of the instance method toMachineCode of the class Function, and the toMachineCode method executes the following steps:
apply(LiveRange.trig); // live range analysis apply(JumpOpt.trig); // jump optimization apply(JumpCanon.trig); // canonicalize JUMPC instructions apply("+BeforeFirstInstSel"); // call Hook point apply(module.targetMachine.instSelTrig); // instruction selection apply("+AfterFirstInstSel"); // call Hook point int count = 0; for (;;) { if (count++ >= REGALOLIMIT) throw new CantHappenException("Looks like an infinite loop during Register allocation"); if (apply(RegisterAllocation.trig)) // register allocation break; // if success apply("+BeforeSecondInstSel"); // call Hook point apply(module.targetMachine.instSelTrig); // instruction selection again apply("+AfterSecondInstSel"); // call Hook point } postProcess(); // eliminate redundant jumpsThe argument string with prefixed "+" in the above calls of the apply method means that it is a hook name. Usually this call does nothing. But, if the method
Root#addHook(String, Transformer)has been called with a hook name and a Transformer object, the doIt method of the Transformer object is executed when the apply method is called with the hook name.
class NewProcess { class NewTrigger implements LocalTransformer{ // or GlobalTransformer void doIt(...){...} ... } static void attach(CompileSpecification spec, Root root){ // if (spec.getCoinsOptions().isSet("newprocess")) { root.addHook("+xxx", new NewTrigger()); // } } }where "+xxx" is a hook name. LocalTransformer is applied to each Function instance, and GlobalTransformer is applied to a Module instance.
When the compiler option
-coins:attach=NewProcessis specified, the NewProcess.attach method is called at the entry of the backend process. And the doIt method of the NewTrigger object is executed at the hook point "+xxx".
-coins:newprocessis specified.
-coins:schedule,attach=coins.backend.sched.Scheduleor
-coins:attach=coins.backend.sched.ScheduleThe attach method of coins.backend.sched.Schedule is defined as follows:
public static void attach(CompileSpecification spec, Root root) { root.addHook("+AfterToMachineCode", after); if (spec.getCoinsOptions().isSet("schedule")) { root.addHook("+AfterFirstInstSel", before); } }Therefore, if the "schedule" option is specified, the instruction scheduler is called twice by two triggers, one is the before trigger at the "+AfterFirstInstSel" hook point, and the other is the after trigger at the "+AfterToMachineCode" hook point. The former point is before the register allocation, and the latter is after the regisiter allocation. If the "schedule" option is not specified, the instruction sheduler is called only once after the regisiter allocation.
The two triggers are almost the same except that the latter does the "nop"-instruction elimination if the target CPU has delayed branch instructions. They are written machine independently, and can be applied to any target machine.
See Fig.12-2 in 12. Performance of COINS compilers for some resultsof this optimization.
Software pipelining is an optimization that can improve the loop-executing performance by taking advantage of the instruction-level parallelism. However, the conventional software pipelining method has several problems. It can only be applied to counter controled loops. In some cases the loop control instruction should be modified. It can not be applied if the number of iterations is smaller than the number of stages of pipelining.
Our method solves all these problems. Our method first schedules the loop control instruction and its dependent instructions, and then schedules the remaining instructions for pipelining.
The software pipelining module is called when the following compiler options are specified:-coins:schedule,pipelining,attach=coins.backend.sched.Schedule
For example, the following source program
for (i = 0; i < N; i++) for (j = 0; j < M; j++) { s = 0; for (k = 0; k < L; k++) s += a[i][k] * b[k][j]; c[i][j] = s; }
is software pipelined as follows.
By specifying the SSA-optimizations as
ssa-opt=prun/divex/cse/cstp/hli/osr/hli/cstp/cpyp/preqp/cstp/rpe/dce/srd3and by specifying the target machine as SPARC, the generated LIR codes for the above program looks like (in assembly form)
.L4: ld [%i5],%f1 ; %f1 = a[i][k] ld [%i4],%f0 ; %f0 = b[k][j] fmuls %f1,%f0,%f0 ; %f0 = %f1*%f0 = a[i][k]*b[k][j] fadds %f2,%f0,%f2 ; %f2 = s = %f2+%f0 = s + a[i][k]*b[k][j] add %i4,2000,%i4 ; %i4 = &b[k][j] + 2000 = &b[k+1][j] add %i5,4,%i5 ; %i5 = &a[i][k] + 4 = &a[i][k+1] cmp %i4,%o2 ; &b[k+1][j] : &b[L][j] bl .L4 ; k+1 < L nopThe result of software pipelining for this LIR is as follows:
.L4: ld [%i4],%f0 add %i4,2000,%i4 ; P cmp %i4,%o2 ; R bge .L7 ; O ld [%i5],%f1 ; L fmuls %f1,%f0,%f3 ; O add %i5,4,%i5 ; G ld [%i4],%f0 ; U add %i4,2000,%i4 ; E cmp %i4,%o2 bge .L6 ld [%i5],%f1 ---------------------------------------------- .L5: fadds %f2,%f3,%f2 fmuls %f1,%f0,%f3 ; K add %i5,4,%i5 ; E ld [%i4],%f0 ; R add %i4,2000,%i4 ; N cmp %i4,%o2 ; E bl .L5 ; L ld [%i5],%f1 ---------------------------------------------- .L6: fadds %f2,%f3,%f2 ; EPIL- .L7: fmuls %f1,%f0,%f3 ; OGUE add %i5,4,%i5 fadds %f2,%f3,%f2In this example, the loop control instruction and its dependent instructions are scheduled at first.
ld [%i4],%f0 ; stage 0 add %i4,2000,%i4 ; stage 0 cmp %i4,%o2 ; stage 0 bl .L4 ; stage 0 nop ; stage 0
Then, by scheduling the remaining instructions for pipelining the kernel loop becomes
fadds %f2,%f0,%f2 ; stage 2 fmuls %f1,%f0,%f0 ; stage 1 add %i5,4,%i5 ; stage 1 ld [%i5],%f1 ; stage 0 ld [%i4],%f0 ; stage 0 add %i4,2000,%i4 ; stage 0 cmp %i4,%o2 ; stage 0 bl .L4 ; stage 0 nop ; stage 0
In our method, all "stage 0" instructions, including the branch instruction, are placed in the prologue part.
Finally, after the phase of regster allocation, the nop instruction is replaced with the "ld [%i5],%f1" instruction by instruction scheduling.-coins:regpromote,attach=RegPromoteThe class RegPromote is in the package coins.backend.contrib. The name of this special package can be omitted.
The register promotion is the process to promote static variables to registers within some loops.
The compiler option
-coins:regpromote-exinvokes the process of extended register promotion that promotes more variables to registers based on pointer analysis.
The register promotion is effective especially for such a program as:
int candidat, quotient, remaindr, index, nth, primenum, loopend ; int primeNumbers[SIZE] ; void getPrime(int primevec[SIZE], int count) { primevec[0] = 2 ; nth = 1 ; candidat = 3 ; while (nth<count) { remaindr = 1 ; index = 0 ; loopend = 0 ; while(loopend==0) { primenum = primevec[index] ; quotient = candidat / primenum ; remaindr = candidat - quotient*primenum ; if (remaindr==0) loopend = 1 ; if (quotient*quotient<candidat) loopend = 1 ; index = index + 1 ; if (index>=nth) loopend = 1 ; } if (remaindr != 0) { primevec[nth] = candidat ; nth = nth + 1 ; } candidat = candidat + 2 ; } nth = 0 ; while (nth<count) { /* print(primevec[nth]) ; */ nth = nth + 1 ; } return; }This program is the prime program in the Fig.12-2 of the chapter 12. Performance of COINS compilers. This register promotion produces almost all the performance gain of the prime program.