* [RFA (revised)] sh-sim, expand the opcode table
@ 2004-02-11 0:40 Michael Snyder
2004-02-11 12:33 ` Joern Rennecke
0 siblings, 1 reply; 5+ messages in thread
From: Michael Snyder @ 2004-02-11 0:40 UTC (permalink / raw)
To: Joern Rennecke; +Cc: joern.rennecke, gdb-patches
[-- Attachment #1: Type: text/plain, Size: 763 bytes --]
Hi Joern,
Here are some benchmark results. Following your advice, I took the
arith-rand test, increased its main loop count until it took around
10 seconds to run on my test machine, and tested it against the eight
optimization combinations that are tested for in the gcc torture test
[see methodology notes attached]
I found that my change increased the runtime by 1.5 to 2 percent
(even when I added the new instructions that I'm working on).
That didn't seem too bad to me, but I took some advice from Alex
Oliva and tried simply changing the sh_jmp_table from char to short.
This much simpler change increased the runtime by only 0.5 to 1
percent, at the cost of 64k more data space.
So I'll withdraw my previous patch and submit the following instead:
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 2984 bytes --]
2004-02-10 Michael Snyder <msnyder@redhat.com>
* gencode.c (table): Change from char to short.
(dumptable): Change generated table from char to short.
* interp.c (sh_jump_table, sh_dsp_table, ppi_table): char to short.
(init_dsp): Compute size of sh_dsp_table.
(sim_resume): Change jump_table from char to short.
Index: gencode.c
===================================================================
RCS file: /cvs/src/src/sim/sh/gencode.c,v
retrieving revision 1.26
diff -p -r1.26 gencode.c
*** gencode.c 27 Jan 2004 23:30:01 -0000 1.26
--- gencode.c 11 Feb 2004 00:34:36 -0000
*************** gengastab ()
*** 2265,2271 ****
}
}
! static unsigned char table[1 << 16];
/* Take an opcode, expand all varying fields in it out and fill all the
right entries in 'table' with the opcode index. */
--- 2265,2271 ----
}
}
! static unsigned short table[1 << 16];
/* Take an opcode, expand all varying fields in it out and fill all the
right entries in 'table' with the opcode index. */
*************** dumptable (name, size, start)
*** 2407,2413 ****
int i = start;
! printf ("unsigned char %s[%d]={\n", name, size);
while (i < start + size)
{
int j = 0;
--- 2407,2413 ----
int i = start;
! printf ("unsigned short %s[%d]={\n", name, size);
while (i < start + size)
{
int j = 0;
Index: interp.c
===================================================================
RCS file: /cvs/src/src/sim/sh/interp.c,v
retrieving revision 1.14
diff -p -r1.14 interp.c
*** interp.c 10 Jan 2004 00:43:28 -0000 1.14
--- interp.c 11 Feb 2004 00:34:37 -0000
***************
*** 53,59 ****
#define SIGTRAP 5
#endif
! extern unsigned char sh_jump_table[], sh_dsp_table[0x1000], ppi_table[];
int sim_write (SIM_DESC sd, SIM_ADDR addr, unsigned char *buffer, int size);
--- 53,59 ----
#define SIGTRAP 5
#endif
! extern unsigned short sh_jump_table[], sh_dsp_table[0x1000], ppi_table[];
int sim_write (SIM_DESC sd, SIM_ADDR addr, unsigned char *buffer, int size);
*************** init_dsp (abfd)
*** 1646,1652 ****
{
int i, tmp;
! for (i = sizeof sh_dsp_table - 1; i >= 0; i--)
{
tmp = sh_jump_table[0xf000 + i];
sh_jump_table[0xf000 + i] = sh_dsp_table[i];
--- 1646,1652 ----
{
int i, tmp;
! for (i = (sizeof sh_dsp_table / sizeof sh_dsp_table[0]) - 1; i >= 0; i--)
{
tmp = sh_jump_table[0xf000 + i];
sh_jump_table[0xf000 + i] = sh_dsp_table[i];
*************** sim_resume (sd, step, siggnal)
*** 1752,1758 ****
void (*prev) ();
void (*prev_fpe) ();
! register unsigned char *jump_table = sh_jump_table;
register int *R = &(saved_state.asregs.regs[0]);
/*register int T;*/
--- 1752,1758 ----
void (*prev) ();
void (*prev_fpe) ();
! register unsigned short *jump_table = sh_jump_table;
register int *R = &(saved_state.asregs.regs[0]);
/*register int T;*/
[-- Attachment #3: methodology --]
[-- Type: text/plain, Size: 898 bytes --]
Benchmark Methodology Notes:
Dell Precision 420 running RH 7.3
No other users, no other user processes except X and windows manager.
Main loop increased from 1000 to 100000 iterations.
Each test run 3 times and averaged.
Time in seconds for control, 16-bit table, and split table,
with percent increase. "omit" == -fomit-frame-pointer,
"unroll" == -funroll-all-loops.
ctrl 2table %inc 16bit %inc
O0 10.862 11.044 1.7 10.884 .20
10.855 11.040 1.7 10.875 .18
O1 6.669 6.810 2.1 6.700 .46
6.660 6.805 2.2 6.700 .60
O2 6.561 6.707 2.2 6.650 1.36
6.555 6.710 2.4 6.650 1.45
O3 5.780 5.876 1.7 5.800 .34
5.780 5.875 1.6 5.800 .34
O3 omit 5.646 5.744 1.7 5.671 .44
5.640 5.740 1.8 5.670 .53
O3 omit unroll 5.636 5.744 1.9 5.664 .50
5.630 5.750 2.1 5.660 .53
O3 -g 5.772 5.875 1.8 5.799 .46
5.770 5.865 1.6 5.795 .43
Os 6.600 6.735 2.0 6.649 .74
6.590 6.730 2.1 6.650 .91
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFA (revised)] sh-sim, expand the opcode table
2004-02-11 0:40 [RFA (revised)] sh-sim, expand the opcode table Michael Snyder
@ 2004-02-11 12:33 ` Joern Rennecke
2004-02-11 19:25 ` Michael Snyder
0 siblings, 1 reply; 5+ messages in thread
From: Joern Rennecke @ 2004-02-11 12:33 UTC (permalink / raw)
To: Michael Snyder; +Cc: Joern Rennecke, joern.rennecke, gdb-patches
>
> This is a multi-part message in MIME format.
> --------------020902070606080708050409
> Content-Type: text/plain; charset=us-ascii; format=flowed
> Content-Transfer-Encoding: 7bit
>
> Hi Joern,
>
> Here are some benchmark results. Following your advice, I took the
> arith-rand test, increased its main loop count until it took around
> 10 seconds to run on my test machine, and tested it against the eight
> optimization combinations that are tested for in the gcc torture test
> [see methodology notes attached]
>
>
> I found that my change increased the runtime by 1.5 to 2 percent
> (even when I added the new instructions that I'm working on).
Was that with or without ACE_FAST ?
> That didn't seem too bad to me, but I took some advice from Alex
> Oliva and tried simply changing the sh_jmp_table from char to short.
> This much simpler change increased the runtime by only 0.5 to 1
> percent, at the cost of 64k more data space.
That makese sense... but what is the cost of adding the new instructions?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFA (revised)] sh-sim, expand the opcode table
2004-02-11 12:33 ` Joern Rennecke
@ 2004-02-11 19:25 ` Michael Snyder
2004-02-12 12:48 ` Joern Rennecke
0 siblings, 1 reply; 5+ messages in thread
From: Michael Snyder @ 2004-02-11 19:25 UTC (permalink / raw)
To: Joern Rennecke; +Cc: Joern Rennecke, gdb-patches
[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]
Joern Rennecke wrote:
>>This is a multi-part message in MIME format.
>>--------------020902070606080708050409
>>Content-Type: text/plain; charset=us-ascii; format=flowed
>>Content-Transfer-Encoding: 7bit
>>
>>Hi Joern,
>>
>>Here are some benchmark results. Following your advice, I took the
>>arith-rand test, increased its main loop count until it took around
>>10 seconds to run on my test machine, and tested it against the eight
>>optimization combinations that are tested for in the gcc torture test
>>[see methodology notes attached]
>>
>>
>>I found that my change increased the runtime by 1.5 to 2 percent
>>(even when I added the new instructions that I'm working on).
>
>
> Was that with or without ACE_FAST ?
Forgot to define ACE_FAST -- new results attached (just for
the 16-bit case).
>>That didn't seem too bad to me, but I took some advice from Alex
>>Oliva and tried simply changing the sh_jmp_table from char to short.
>>This much simpler change increased the runtime by only 0.5 to 1
>>percent, at the cost of 64k more data space.
>
>
> That makese sense... but what is the cost of adding the new instructions?
Imprecisely speaking, they did not seem to make any difference at all.
The number of new insns is actually not that great -- they only bring
the switch statement up to about 290 cases.
[-- Attachment #2: results.ace --]
[-- Type: text/plain, Size: 340 bytes --]
orig 16bit %inc
O0 9.048 9.049 .01
9.050 9.035 0
O1 5.557 5.592 .63
5.550 5.595 .81
O2 5.465 5.538 1.3
5.460 5.535 1.4
O3 4.784 4.830 .96
4.775 4.820 .94
O3 omit 4.695 4.747 1.1
4.690 4.750 1.3
O3 omit unroll 4.694 4.738 .94
4.695 4.735 .85
O3 -g 4.786 4.841 1.1
4.780 4.840 1.3
Os 5.475 5.525 .91
5.475 5.515 .73
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFA (revised)] sh-sim, expand the opcode table
2004-02-11 19:25 ` Michael Snyder
@ 2004-02-12 12:48 ` Joern Rennecke
2004-02-12 19:32 ` Michael Snyder
0 siblings, 1 reply; 5+ messages in thread
From: Joern Rennecke @ 2004-02-12 12:48 UTC (permalink / raw)
To: Michael Snyder; +Cc: Joern Rennecke, Joern Rennecke, gdb-patches
> Imprecisely speaking, they did not seem to make any difference at all.
> The number of new insns is actually not that great -- they only bring
> the switch statement up to about 290 cases.
OK. I think we can live with the 1% slowdown then, considering that
we expand the expressiveness of the jump table in a way that should be
good for several decades.
When I initailly added the sh-dsp stuff, the slowdown was much worse,
which is why I then spent some time optimizing the simulator till the
performance was better than it was before the sh-dsp support was added.
FWIW, we don't actually benefit much from technological progress here
to mitigate slowdowns, because we already have rising numbers of
multilibs, compiler options, and sheer volume of tests to go through,
which is why simulator performance is still more critical than gcc
build times for total regression test time.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-02-12 19:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-11 0:40 [RFA (revised)] sh-sim, expand the opcode table Michael Snyder
2004-02-11 12:33 ` Joern Rennecke
2004-02-11 19:25 ` Michael Snyder
2004-02-12 12:48 ` Joern Rennecke
2004-02-12 19:32 ` Michael Snyder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox