* [RFA] sh-sim: free up some room in jump_table
@ 2004-02-07 0:29 Michael Snyder
2004-02-07 18:25 ` Joern Rennecke
0 siblings, 1 reply; 5+ messages in thread
From: Michael Snyder @ 2004-02-07 0:29 UTC (permalink / raw)
To: Joern Rennecke; +Cc: joern.rennecke, gdb-patches
[-- Attachment #1: Type: text/plain, Size: 914 bytes --]
Joern,
This is an advance hint that I am working on the next generation LSI
from Renesas. Apparently at this stage I am allowed to mention its
name (sh2a), but nothing more. ;-/
The first thing I need to do is free up some room for new insns.
The jump table is just about full (it can only hold 255 entries).
This patch splits the dsp instructions (the contents of movsxy_tab)
into their own jump table. As a side benefit, it's no longer
necessary to swap them in and out of the main jump table at
runtime. This gives us room to add lots of new instructions.
It passes all my regression tests (and I have some that I haven't
submitted yet).
Earlier today, you mentioned your concern about runtime efficiency.
I haven't done any performance testing, but since this makes the
main switch statement smaller, I imagine it might have some benefit
(similar to ifdeffing out the contents of movsxy_tab).
Michael
[-- Attachment #2: tables --]
[-- Type: text/plain, Size: 5627 bytes --]
2004-02-06 Michael Snyder <msnyder@redhat.com>
* gencode.c: Split sh_dsp_table out of jump_table,
to allow room for more new instructions.
(filltable): Let index be initialized to one at each call.
(gensim_caselist): Finish the switch statement at each call.
(gensim): Let sim_resume use two separate jump tables:
one for dsp instructions, and one for everything else.
(main): Zero out the table between calls to filltable.
(op tab): Use SET_NIP consistantly.
* interp.c (init_dsp): It is no longer necessary to swap
sh_dsp_table in and out of sh_jump_table.
Index: gencode.c
===================================================================
RCS file: /cvs/src/src/sim/sh/gencode.c,v
retrieving revision 1.26
diff -p -r1.26 gencode.c
*** gencode.c 27 Jan 2004 23:30:01 -0000 1.26
--- gencode.c 7 Feb 2004 00:06:41 -0000
*************** op tab[] =
*** 1102,1108 ****
},
{ "", "", "sleep", "0000000000011011",
! "nip += trap (0xc3, &R0, PC, memory, maskl, maskw, endianw);",
},
{ "n", "", "stc <CREG_M>,<REG_N>", "0000nnnnmmmm0010",
--- 1102,1108 ----
},
{ "", "", "sleep", "0000000000011011",
! "SET_NIP (nip + trap (0xc3, &R0, PC, memory, maskl, maskw, endianw));",
},
{ "n", "", "stc <CREG_M>,<REG_N>", "0000nnnnmmmm0010",
*************** op tab[] =
*** 1192,1199 ****
{ "0", "", "trapa #<imm>", "11000011i8*1....",
"long imm = 0xff & i;",
! "if (i < 20 || i == 33 || i == 34 || i == 0xc3)",
! " nip += trap (i, &R0, PC, memory, maskl, maskw, endianw);",
#if 0
"else {",
/* SH-[12] */
--- 1192,1200 ----
{ "0", "", "trapa #<imm>", "11000011i8*1....",
"long imm = 0xff & i;",
! "if (i < 20 || i == 33 || i == 34 || i == 0xc3) {",
! " SET_NIP (nip + trap (i, &R0, PC, memory, maskl, maskw, endianw));",
! "}",
#if 0
"else {",
/* SH-[12] */
*************** op movsxy_tab[] =
*** 1516,1522 ****
},
{ "", "", "ppi", "1111100000000000",
"ppi_insn (RIAT (nip));",
! "nip += 2;",
"iword &= 0xf7ff; goto top;",
},
#endif
--- 1517,1523 ----
},
{ "", "", "ppi", "1111100000000000",
"ppi_insn (RIAT (nip));",
! "SET_NIP (nip + 2);",
"iword &= 0xf7ff; goto top;",
},
#endif
*************** static void
*** 2438,2444 ****
filltable (p)
op *p;
{
! static int index = 1;
sorttab ();
for (; p->name; p++)
--- 2439,2445 ----
filltable (p)
op *p;
{
! int index = 1;
sorttab ();
for (; p->name; p++)
*************** gensim_caselist (p)
*** 2669,2683 ****
char *r;
for (r = p->defs; *r; r++)
{
! if (*r == '0') printf(" CDEF (0);\n");
! if (*r == 'n') printf(" CDEF (n);\n");
! if (*r == 'm') printf(" CDEF (m);\n");
}
}
printf (" break;\n");
printf (" }\n");
}
}
static void
--- 2670,2689 ----
char *r;
for (r = p->defs; *r; r++)
{
! if (*r == '0') printf (" CDEF (0);\n");
! if (*r == 'n') printf (" CDEF (n);\n");
! if (*r == 'm') printf (" CDEF (m);\n");
}
}
printf (" break;\n");
printf (" }\n");
}
+ printf (" default:\n");
+ printf (" {\n");
+ printf (" RAISE_EXCEPTION (SIGILL);\n");
+ printf (" }\n");
+ printf (" }\n");
}
static void
*************** gensim ()
*** 2696,2711 ****
printf ("#define DSP_xy(R) ((R)==0 ? 8 : (R)==2 ? 9 : (R)==1 ? 10 : 11)\n");
printf ("/* DSP_yx = [y0, y1, x0, x1]. */\n");
printf ("#define DSP_yx(R) ((R)==0 ? 10 : (R)==1 ? 11 : (R)==2 ? 8 : 9)\n");
! printf (" switch (jump_table[iword]) {\n");
!
! gensim_caselist (tab);
gensim_caselist (movsxy_tab);
!
! printf (" default:\n");
! printf (" {\n");
! printf (" RAISE_EXCEPTION (SIGILL);\n");
! printf (" }\n");
! printf (" }\n");
printf ("}\n");
}
--- 2702,2713 ----
printf ("#define DSP_xy(R) ((R)==0 ? 8 : (R)==2 ? 9 : (R)==1 ? 10 : 11)\n");
printf ("/* DSP_yx = [y0, y1, x0, x1]. */\n");
printf ("#define DSP_yx(R) ((R)==0 ? 10 : (R)==1 ? 11 : (R)==2 ? 8 : 9)\n");
! printf (" if (target_dsp && \n");
! printf (" (iword & 0xf000) == 0xf000)\n");
! printf (" switch (sh_dsp_table[iword & 0xfff]) {\n");
gensim_caselist (movsxy_tab);
! printf (" else switch (jump_table[iword]) {\n");
! gensim_caselist (tab);
printf ("}\n");
}
*************** main (ac, av)
*** 3017,3022 ****
--- 3019,3025 ----
else if (strcmp (av[1], "-x") == 0)
{
filltable (tab);
+ memset (table, 0, sizeof table);
filltable (movsxy_tab);
gensim ();
}
Index: interp.c
===================================================================
RCS file: /cvs/src/src/sim/sh/interp.c,v
retrieving revision 1.14
diff -p -r1.14 interp.c
*** interp.c 10 Jan 2004 00:43:28 -0000 1.14
--- interp.c 7 Feb 2004 00:06:41 -0000
*************** static void
*** 1571,1577 ****
init_dsp (abfd)
struct bfd *abfd;
{
- int was_dsp = target_dsp;
unsigned long mach = bfd_get_mach (abfd);
if (mach == bfd_mach_sh_dsp ||
--- 1571,1576 ----
*************** init_dsp (abfd)
*** 1640,1657 ****
{
saved_state.asregs.xram_start = 1;
saved_state.asregs.yram_start = 1;
- }
-
- if (target_dsp != was_dsp)
- {
- int i, tmp;
-
- for (i = sizeof sh_dsp_table - 1; i >= 0; i--)
- {
- tmp = sh_jump_table[0xf000 + i];
- sh_jump_table[0xf000 + i] = sh_dsp_table[i];
- sh_dsp_table[i] = tmp;
- }
}
}
--- 1639,1644 ----
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFA] sh-sim: free up some room in jump_table
2004-02-07 0:29 [RFA] sh-sim: free up some room in jump_table Michael Snyder
@ 2004-02-07 18:25 ` Joern Rennecke
2004-02-09 20:34 ` Michael Snyder
0 siblings, 1 reply; 5+ messages in thread
From: Joern Rennecke @ 2004-02-07 18:25 UTC (permalink / raw)
To: Michael Snyder; +Cc: Joern Rennecke, joern.rennecke, gdb-patches
> ! printf (" if (target_dsp && \n");
> ! printf (" (iword & 0xf000) == 0xf000)\n");
> ! printf (" switch (sh_dsp_table[iword & 0xfff]) {\n");
> gensim_caselist (movsxy_tab);
> ! printf (" else switch (jump_table[iword]) {\n");
You have changed a straight dispatch into an if-then-else with
two dispatches, and the integer and fpu arithmetic path goes the long way
round the dsp dispatch; this seems to be a surefire way to make the
simulator slower.
We don't relly care much about the total size of the simulator, but
we care about its working set size, so why don't you generate two
separate simulator main loops, to be compiler into separate *.o
files, one with the FPU instructions, and the other one with the
dsp instructions?
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFA] sh-sim: free up some room in jump_table
2004-02-07 18:25 ` Joern Rennecke
@ 2004-02-09 20:34 ` Michael Snyder
2004-02-09 20:38 ` Joern Rennecke
2004-02-09 21:16 ` Joern Rennecke
0 siblings, 2 replies; 5+ messages in thread
From: Michael Snyder @ 2004-02-09 20:34 UTC (permalink / raw)
To: Joern Rennecke; +Cc: joern.rennecke, gdb-patches
Joern Rennecke wrote:
>>! printf (" if (target_dsp && \n");
>>! printf (" (iword & 0xf000) == 0xf000)\n");
>>! printf (" switch (sh_dsp_table[iword & 0xfff]) {\n");
>> gensim_caselist (movsxy_tab);
>>! printf (" else switch (jump_table[iword]) {\n");
>
>
> You have changed a straight dispatch into an if-then-else with
> two dispatches, and the integer and fpu arithmetic path goes the long way
> round the dsp dispatch; this seems to be a surefire way to make the
> simulator slower.
>
> We don't relly care much about the total size of the simulator, but
> we care about its working set size, so why don't you generate two
> separate simulator main loops, to be compiler into separate *.o
> files, one with the FPU instructions, and the other one with the
> dsp instructions?
OK, I need to catch up with you here. So, your concern is not
with the time it takes to execute the if condition, but with the
size and/or distribution of the working set? I'm not very used
to programming around such considerations, so I'll look to you
for guidance.
I can see the sense of making two loops, but why would it
be necessary for them to be in two separate compilation units?
Would you be willing to specify a performance test that I can
use, and a test criterion for me to meet? It might save time,
given that we seem to have a 24 hour email cycle.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFA] sh-sim: free up some room in jump_table
2004-02-09 20:34 ` Michael Snyder
@ 2004-02-09 20:38 ` Joern Rennecke
2004-02-09 21:16 ` Joern Rennecke
1 sibling, 0 replies; 5+ messages in thread
From: Joern Rennecke @ 2004-02-09 20:38 UTC (permalink / raw)
To: Michael Snyder; +Cc: Joern Rennecke, joern.rennecke, gdb-patches
>
> Joern Rennecke wrote:
> >>! printf (" if (target_dsp && \n");
> >>! printf (" (iword & 0xf000) == 0xf000)\n");
> >>! printf (" switch (sh_dsp_table[iword & 0xfff]) {\n");
> >> gensim_caselist (movsxy_tab);
> >>! printf (" else switch (jump_table[iword]) {\n");
> >
> >
> > You have changed a straight dispatch into an if-then-else with
> > two dispatches, and the integer and fpu arithmetic path goes the long way
> > round the dsp dispatch; this seems to be a surefire way to make the
> > simulator slower.
> >
> > We don't relly care much about the total size of the simulator, but
> > we care about its working set size, so why don't you generate two
> > separate simulator main loops, to be compiler into separate *.o
> > files, one with the FPU instructions, and the other one with the
> > dsp instructions?
>
> OK, I need to catch up with you here. So, your concern is not
> with the time it takes to execute the if condition, but with the
> size and/or distribution of the working set? I'm not very used
> to programming around such considerations, so I'll look to you
> for guidance.
Actually, I am concerned about both.
> I can see the sense of making two loops, but why would it
> be necessary for them to be in two separate compilation units?
>
> Would you be willing to specify a performance test that I can
> use, and a test criterion for me to meet? It might save time,
> given that we seem to have a 24 hour email cycle.
In the past I've used running arith-rand from the c-torture testsuite,
with an iteration count to give a meaningful execution time -
I think it was someting like a minute or a few.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFA] sh-sim: free up some room in jump_table
2004-02-09 20:34 ` Michael Snyder
2004-02-09 20:38 ` Joern Rennecke
@ 2004-02-09 21:16 ` Joern Rennecke
1 sibling, 0 replies; 5+ messages in thread
From: Joern Rennecke @ 2004-02-09 21:16 UTC (permalink / raw)
To: Michael Snyder; +Cc: Joern Rennecke, joern.rennecke, gdb-patches
> Would you be willing to specify a performance test that I can
> use, and a test criterion for me to meet? It might save time,
> given that we seem to have a 24 hour email cycle.
P.S.: I think arith-rand.c should be compiler with optimization (-O2)
to avoid too much of a skew towards memory operations.
The simulator as an ACE_FAST compile-time setting te remove cycle
counting overhead. It is really only this stripped-down functionality
that we need for compiler correctness regression tests, while the
cycle counts could conceivable be used for future optimizer quality
regression tests...
I expect the effect of code rearrangement to be different for SH2, SH3
and SH4 because of the way the availability of a barrel shifter / floating
point hardware affects the implementation of division.
I don't expect endianness matters for this benchmark on the level
of instruction mix, although it will matter for what the host has
to do to implement byte accesses. I don't expect the current implementation
to show appreciable differences depending on endianness, although it probably
makes sense to verify this assumption once.
And if you change the handling of the bi-endianness, that might also
change the how sensitive the timings are to endianness.
The actual goal is to avoid regression testing in testing time for
a fully-multilibbed toolchain, i.e. it tests all optimization levels,
for eleven combinations of cpu type and endianness, for C and C++,
and possibly also for objc and fortran. The trouble is that we have
a lot of context switching, so in addition to a long execution time
there will be a lot of noise, which means you'd have to run the test
several times on a quiet machine to get a reliable median.
This is why I've looked for something simpler to benchmark.
I've picked arith-rand because with the right iteration count setting
(which AFAIR was the default back then), it has an execution time long
enough that variations below one percent could still be accurately measured,
and it has a mix of loops, variable access and arithmetic, at a much more
reasonable scale that dhrystone. Besides, arith-rand actually accounted
for a few minutes of the c-torture execution time...
So, although the *real* benchmark is the gcc testsuite, I think
it's much more managable to use a smaller model, like arith-rand.c -O2.
If you have an idea for a better model (or enough CPU time to throw at
testing to do an exect testing time regression test :-), that would
be welcome too, of course.
As to the weighting of the execution times, I think it makes sense to
base it on the number of multilib in the test gantlet.
SH1 and SH2 are three of the targets (there is no little endian SH1),
SH3E is two, and SH4 is six (three ABIs times two endiannesses).
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-02-09 21:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-07 0:29 [RFA] sh-sim: free up some room in jump_table Michael Snyder
2004-02-07 18:25 ` Joern Rennecke
2004-02-09 20:34 ` Michael Snyder
2004-02-09 20:38 ` Joern Rennecke
2004-02-09 21:16 ` Joern Rennecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox