From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19403 invoked by alias); 28 Jul 2010 02:07:09 -0000 Received: (qmail 18492 invoked by uid 22791); 28 Jul 2010 02:07:00 -0000 X-SWARE-Spam-Status: No, hits=-4.4 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TW_GJ,TW_RX,TW_TJ,TW_XF,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 28 Jul 2010 02:06:46 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o6S26ibS030806 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 27 Jul 2010 22:06:44 -0400 Received: from greed.delorie.com ([10.3.112.10]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o6S26gcu012984 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 27 Jul 2010 22:06:44 -0400 Received: from greed.delorie.com (greed.delorie.com [127.0.0.1] (may be forged)) by greed.delorie.com (8.14.3/8.14.3) with ESMTP id o6S26geM013728; Tue, 27 Jul 2010 22:06:42 -0400 Received: (from dj@localhost) by greed.delorie.com (8.14.3/8.14.3/Submit) id o6S26g5e013725; Tue, 27 Jul 2010 22:06:42 -0400 Date: Wed, 28 Jul 2010 02:07:00 -0000 Message-Id: <201007280206.o6S26g5e013725@greed.delorie.com> From: DJ Delorie To: gdb-patches@sourceware.org Subject: [rx-sim]: add cycle accuracy MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2010-07/txt/msg00462.txt.bz2 This is a rather large but single-directory patch which makes the RX simulator cycle-accurate. Well, mostly cycle accurate anyway - it's within a small fraction of a percent compared to real hardware, on large benchmarks. There's some speedups and documentation included too. OK to commit? I also built it with -Wall -Werror There doesn't seem to be an rx-specific sim maintainer. Since I wrote it, should I be the maintainer? * README.txt: New. * config.h (CYCLE_ACCURATE, CYCLE_STATS): New. * configure.in (--enable-cycle-accurate, --enable-cycle-stats): New. Default to enabled. * configure: Regenerate. * cpu.h (regs_type): Add cycle tracking info. (reset_pipeline_stats): Declare. (halt_pipeline_stats): Declare. (pipeline_stats): Declare. * main.c (done): Call pipeline_stats(). * mem.h (rx_mem_ptr): Moved to here ... * mem.c (mem_ptr): ... from here. Rename throughout. (mem_put_byte): Move LEDs to Port A. Add Port B to control cycle statistics. Move UART to SCI4. (mem_put_hi): Add TPU 1-2. TPU 1 and 2 count CPU cycles. * reg.c (init_regs): Set Rt reg to -1 (no reg). * rx.c: Add cycle counting and statistics throughout. (rx_get_byte): Optimize for speed. (decode_opcode): Likewise. (reset_pipeline_stats): New. (halt_pipeline_stats): New. (pipeline_stats): New. * trace.c (sim_disasm_one): Print cycle count. Index: README.txt =================================================================== RCS file: README.txt diff -N README.txt --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ README.txt 28 Jul 2010 02:00:19 -0000 @@ -0,0 +1,121 @@ +The RX simulator offers two rx-specific configure options: + +--enable-cycle-accurate (default) +--disable-cycle-accurate + +If enabled, the simulator will keep track of how many cycles each +instruction takes. While not 100% accurate, it is very close, +including modelling fetch stalls and register latency. + +--enable-cycle-stats (default) +--disable-cycle-stats + +If enabled, specifying "-v" twice on the simulator command line causes +the simulator to print statistics on how much time was used by each +type of opcode, and what pairs of opcodes tend to happen most +frequently, as well as how many times various pipeline stalls +happened. + + + +The RX simulator offers many command line options: + +-v - verbose output. This prints some information about where the +program is being loaded and its starting address, as well as +information about how much memory was used and how many instructions +were executed during the run. If specified twice, pipeline and cycle +information are added to the report. + +-d - disassemble output. Each instruction executed is printed. + +-t - trace output. Causes a *lot* of printed information about what + every instruction is doing, from math results down to register + changes. + +--ignore-* +--warn-* +--error-* + + The RX simulator can detect certain types of memory corruption, and + either ignore them, warn the user about them, or error and exit. + Note that valid GCC code may trigger some of these, for example, + writing a bitfield involves reading the existing value, which may + not have been set yet. The options for * are: + + null-deref - memory access to address zero. You must modify your + linker script to avoid putting anything at location zero, of + course. + + unwritten-pages - attempts to read a page of memory (see below) + before it is written. This is much faster than the next option. + + unwritten-bytes - attempts to read individual bytes before they're + written. + + corrupt-stack - On return from a subroutine, the memory location + where $pc was stored is checked to see if anything other than + $pc had been written to it most recently. + +-i -w -e - these three options change the settings for all of the + above. For example, "-i" tells the simulator to ignore all memory + corruption. + +-E - end of options. Any remaining options (after the program name) + are considered to be options for the simulated program, although + such functionality is not supported. + + + +The RX simulator simulates a small number of peripherals, mostly in +order to provide I/O capabilities for testing and such. The supported +peripherals, and their limitations, are documented here. + +Memory + +Memory for the simulator is stored in a heirarchical tree, much like +the i386's page directory and page tables. The simulator can allocate +memory to individual pages as needed, allowing the simulated program +to act as if it had a full 4 Gb of RAM at its disposal, without +actually allocating more memory from the host operating system than +the simulated program actually uses. Note that for each page of +memory, there's a corresponding page of memory *types* (for tracking +memory corruption). Memory is initially filled with all zeros. + +GPIO Port A + +PA.DR is configured as an output-only port (regardless of PA.DDR). +When written to, a row of colored @ and * symbols are printed, +reflecting a row of eight LEDs being either on or off. + +GPIO Port B + +PB.DR controls the pipeline statistics. Writing a 0 to PB.DR disables +statistics gathering. Writing a non-0 to PB.DR resets all counters +and enables (even if already enabled) statistics gathering. The +simulator starts with statistics enabled, so writing to PB.DR is not +needed if you want statistics on the entire program's run. + +SCI4 + +SCI4.TDR is connected to the simulator's stdout. Any byte written to +SCI4.TDR is written to stdout. If the simulated program writes the +bytes 3, 3, and N in sequence, the simulator exits with an exit value +of N. + +SCI4.SSR always returns "transmitter empty". + + +TPU1.TCNT +TPU2.TCNT + +TPU1 and TPU2 are configured as a chained 32-bit counter which counts +machine cycles. It always runs at "ICLK speed", regardless of the +clock control settings. Writing to either of these 16-bit registers +zeros the counter, regardless of the value written. Reading from +these registers returns the elapsed cycle count, with TPU1 holding the +most significant word and TPU2 holding the least significant word. + +Note that, much like the hardware, these values may (TPU2.CNT *will*) +change between reads, so you must read TPU1.CNT, then TPU2.CNT, and +then TPU1.CNT again, and only trust the values if both reads of +TPU1.CNT were the same. Index: config.in =================================================================== RCS file: /cvs/src/src/sim/rx/config.in,v retrieving revision 1.2 diff -p -U3 -r1.2 config.in --- config.in 14 Feb 2010 07:37:11 -0000 1.2 +++ config.in 28 Jul 2010 02:00:19 -0000 @@ -105,3 +105,9 @@ /* Define to 1 if you have the ANSI C header files. */ #undef STDC_HEADERS + +/* --enable-cycle-accurate */ +#undef CYCLE_ACCURATE + +/* --enable-cycle-stats */ +#undef CYCLE_STATS Index: configure.in =================================================================== RCS file: /cvs/src/src/sim/rx/configure.in,v retrieving revision 1.3 diff -p -U3 -r1.3 configure.in --- configure.in 14 Feb 2010 07:37:11 -0000 1.3 +++ configure.in 28 Jul 2010 02:00:19 -0000 @@ -25,6 +25,36 @@ AC_CHECK_HEADERS(getopt.h) sinclude(../common/aclocal.m4) +AC_ARG_ENABLE(cycle-accurate, +[ --disable-cycle-accurate ], +[case "${enableval}" in +yes | no) ;; +*) AC_MSG_ERROR(bad value ${enableval} given for --enable-cycle-accurate option) ;; +esac]) + +AC_ARG_ENABLE(cycle-stats, +[ --disable-cycle-stats ], +[case "${enableval}" in +yes | no) ;; +*) AC_MSG_ERROR(bad value ${enableval} given for --enable-cycle-stats option) ;; +esac]) + +echo enable_cycle_accurate is $enable_cycle_accurate +echo enable_cycle_stats is $enable_cycle_stats + +if test "x${enable_cycle_accurate}" != xno; then +AC_DEFINE([CYCLE_ACCURATE]) + + if test "x${enable_cycle_stats}" != xno; then + AC_DEFINE([CYCLE_STATS]) + fi +else + if test "x${enable_cycle_stats}" != xno; then + AC_ERROR([cycle-stats not available without cycle-accurate]) + fi +fi + + # Bugs in autoconf 2.59 break the call to SIM_AC_COMMON, hack around # it by inlining the macro's contents. sinclude(../common/common.m4) Index: cpu.h =================================================================== RCS file: /cvs/src/src/sim/rx/cpu.h,v retrieving revision 1.2 diff -p -U3 -r1.2 cpu.h --- cpu.h 1 Jan 2010 10:03:33 -0000 1.2 +++ cpu.h 28 Jul 2010 02:00:19 -0000 @@ -76,8 +76,24 @@ typedef struct SI r_temp; DI r_acc; + +#ifdef CYCLE_ACCURATE + /* If set, RTS/RTSD take 2 fewer cycles. */ + char fast_return; + SI link_register; + + unsigned long long cycle_count; + /* Bits saying what kind of memory operands the previous insn had. */ + int m2m; + /* Target register for load. */ + int rt; +#endif } regs_type; +#define M2M_SRC 0x01 +#define M2M_DST 0x02 +#define M2M_BOTH 0x03 + #define sp 0 #define psw 16 #define pc 17 @@ -219,6 +235,9 @@ extern unsigned int heaptop; extern unsigned int heapbottom; extern int decode_opcode (void); +extern void reset_pipeline_stats (void); +extern void halt_pipeline_stats (void); +extern void pipeline_stats (void); extern void trace_register_changes (); extern void generate_access_exception (void); Index: main.c =================================================================== RCS file: /cvs/src/src/sim/rx/main.c,v retrieving revision 1.3 diff -p -U3 -r1.3 main.c --- main.c 14 Feb 2010 07:37:11 -0000 1.3 +++ main.c 28 Jul 2010 02:00:19 -0000 @@ -82,6 +82,8 @@ done (int exit_code) printf ("insns: %14s\n", comma (rx_cycles)); else printf ("insns: %u\n", rx_cycles); + + pipeline_stats (); } exit (exit_code); } Index: mem.c =================================================================== RCS file: /cvs/src/src/sim/rx/mem.c,v retrieving revision 1.2 diff -p -U3 -r1.2 mem.c --- mem.c 1 Jan 2010 10:03:33 -0000 1.2 +++ mem.c 28 Jul 2010 02:00:19 -0000 @@ -25,6 +25,7 @@ along with this program. If not, see #include #include @@ -37,7 +38,7 @@ along with this program. If not, see > (L2_BITS + OFF_BITS)) & ((1 << L1_BITS) - 1); int pt2 = (address >> OFF_BITS) & ((1 << L2_BITS) - 1); @@ -240,7 +234,7 @@ e () static char mtypec (int address) { - unsigned char *cp = mem_ptr (address, MPA_CONTENT_TYPE); + unsigned char *cp = rx_mem_ptr (address, MPA_CONTENT_TYPE); return "udp"[*cp]; } @@ -254,48 +248,75 @@ mem_put_byte (unsigned int address, unsi if (trace) tc = mtypec (address); - m = mem_ptr (address, MPA_WRITING); + m = rx_mem_ptr (address, MPA_WRITING); if (trace) printf (" %02x%c", value, tc); *m = value; switch (address) { - case 0x00e1: - { + case 0x0008c02a: /* PA.DR */ + { static int old_led = -1; - static char *led_on[] = - { "\033[31m O ", "\033[32m O ", "\033[34m O " }; - static char *led_off[] = { "\033[0m · ", "\033[0m · ", "\033[0m · " }; + int red_on = 0; int i; + if (old_led != value) { - fputs (" ", stdout); - for (i = 0; i < 3; i++) + fputs (" ", stdout); + for (i = 0; i < 8; i++) if (value & (1 << i)) - fputs (led_off[i], stdout); + { + if (! red_on) + { + fputs ("\033[31m", stdout); + red_on = 1; + } + fputs (" @", stdout); + } else - fputs (led_on[i], stdout); - fputs ("\033[0m\r", stdout); + { + if (red_on) + { + fputs ("\033[0m", stdout); + red_on = 0; + } + fputs (" *", stdout); + } + + if (red_on) + fputs ("\033[0m", stdout); + + fputs ("\r", stdout); fflush (stdout); old_led = value; } } break; - case 0x3aa: /* uart1tx */ +#ifdef CYCLE_STATS + case 0x0008c02b: /* PB.DR */ { - static int pending_exit = 0; if (value == 0) + halt_pipeline_stats (); + else + reset_pipeline_stats (); + } +#endif + + case 0x00088263: /* SCI4.TDR */ + { + static int pending_exit = 0; + if (pending_exit == 2) { - if (pending_exit) - { - step_result = RX_MAKE_EXITED(value); - return; - } - pending_exit = 1; + step_result = RX_MAKE_EXITED(value); + longjmp (decode_jmp_buf, 1); } + else if (value == 3) + pending_exit ++; else - putchar(value); + pending_exit = 0; + + putchar(value); } break; @@ -314,19 +335,33 @@ mem_put_qi (int address, unsigned char v COUNT (1, 1); } +static int tpu_base; + void mem_put_hi (int address, unsigned short value) { S ("<="); - if (rx_big_endian) - { - mem_put_byte (address, value >> 8); - mem_put_byte (address + 1, value & 0xff); - } - else + switch (address) { - mem_put_byte (address, value & 0xff); - mem_put_byte (address + 1, value >> 8); +#ifdef CYCLE_ACCURATE + case 0x00088126: /* TPU1.TCNT */ + tpu_base = regs.cycle_count; + break; + case 0x00088136: /* TPU2.TCNT */ + tpu_base = regs.cycle_count; + break; +#endif + default: + if (rx_big_endian) + { + mem_put_byte (address, value >> 8); + mem_put_byte (address + 1, value & 0xff); + } + else + { + mem_put_byte (address, value & 0xff); + mem_put_byte (address + 1, value >> 8); + } } E (); COUNT (1, 2); @@ -388,7 +423,7 @@ mem_put_blk (int address, void *bufptr, unsigned char mem_get_pc (int address) { - unsigned char *m = mem_ptr (address, MPA_READING); + unsigned char *m = rx_mem_ptr (address, MPA_READING); COUNT (0, 0); return *m; } @@ -399,12 +434,12 @@ mem_get_byte (unsigned int address) unsigned char *m; S ("=>"); - m = mem_ptr (address, MPA_READING); + m = rx_mem_ptr (address, MPA_READING); switch (address) { - case 0x3ad: /* uart1c1 */ + case 0x00088264: /* SCI4.SSR */ E(); - return 2; /* transmitter empty */ + return 0x04; /* transmitter empty */ break; default: if (trace) @@ -433,15 +468,28 @@ mem_get_hi (int address) { unsigned short rv; S ("=>"); - if (rx_big_endian) - { - rv = mem_get_byte (address) << 8; - rv |= mem_get_byte (address + 1); - } - else + switch (address) { - rv = mem_get_byte (address); - rv |= mem_get_byte (address + 1) << 8; +#ifdef CYCLE_ACCURATE + case 0x00088126: /* TPU1.TCNT */ + rv = (regs.cycle_count - tpu_base) >> 16; + break; + case 0x00088136: /* TPU2.TCNT */ + rv = (regs.cycle_count - tpu_base) >> 0; + break; +#endif + + default: + if (rx_big_endian) + { + rv = mem_get_byte (address) << 8; + rv |= mem_get_byte (address + 1); + } + else + { + rv = mem_get_byte (address); + rv |= mem_get_byte (address + 1) << 8; + } } COUNT (0, 2); E (); @@ -520,7 +568,7 @@ sign_ext (int v, int bits) void mem_set_content_type (int address, enum mem_content_type type) { - unsigned char *mt = mem_ptr (address, MPA_CONTENT_TYPE); + unsigned char *mt = rx_mem_ptr (address, MPA_CONTENT_TYPE); *mt = type; } @@ -537,7 +585,7 @@ mem_set_content_range (int start_address if (sz + ofs > L1_LEN) sz = L1_LEN - ofs; - mt = mem_ptr (start_address, MPA_CONTENT_TYPE); + mt = rx_mem_ptr (start_address, MPA_CONTENT_TYPE); memset (mt, type, sz); start_address += sz; @@ -547,6 +595,6 @@ mem_set_content_range (int start_address enum mem_content_type mem_get_content_type (int address) { - unsigned char *mt = mem_ptr (address, MPA_CONTENT_TYPE); + unsigned char *mt = rx_mem_ptr (address, MPA_CONTENT_TYPE); return *mt; } Index: mem.h =================================================================== RCS file: /cvs/src/src/sim/rx/mem.h,v retrieving revision 1.2 diff -p -U3 -r1.2 mem.h --- mem.h 1 Jan 2010 10:03:33 -0000 1.2 +++ mem.h 28 Jul 2010 02:00:19 -0000 @@ -25,10 +25,25 @@ enum mem_content_type { MC_NUM_TYPES }; +enum mem_ptr_action +{ + MPA_WRITING, + MPA_READING, + MPA_CONTENT_TYPE +}; + void init_mem (void); void mem_usage_stats (void); unsigned long mem_usage_cycles (void); +/* rx_mem_ptr returns a pointer which is valid as long as the address + requested remains within the same page. */ +#define PAGE_BITS 12 +#define PAGE_SIZE (1 << PAGE_BITS) +#define NONPAGE_MASK (~(PAGE_SIZE-1)) + +unsigned char *rx_mem_ptr (unsigned long address, enum mem_ptr_action action); + void mem_put_qi (int address, unsigned char value); void mem_put_hi (int address, unsigned short value); void mem_put_psi (int address, unsigned long value); Index: reg.c =================================================================== RCS file: /cvs/src/src/sim/rx/reg.c,v retrieving revision 1.3 diff -p -U3 -r1.3 reg.c --- reg.c 8 Jun 2010 09:15:17 -0000 1.3 +++ reg.c 28 Jul 2010 02:00:19 -0000 @@ -19,6 +19,7 @@ along with this program. If not, see . */ +#include "config.h" #include #include #include @@ -67,6 +68,11 @@ init_regs (void) { memset (®s, 0, sizeof (regs)); memset (&oldregs, 0, sizeof (oldregs)); + +#ifdef CYCLE_ACCURATE + regs.rt = -1; + oldregs.rt = -1; +#endif } static unsigned int Index: rx.c =================================================================== RCS file: /cvs/src/src/sim/rx/rx.c,v retrieving revision 1.4 diff -p -U3 -r1.4 rx.c --- rx.c 1 Jan 2010 10:03:33 -0000 1.4 +++ rx.c 28 Jul 2010 02:00:19 -0000 @@ -18,6 +18,7 @@ GNU General Public License for more deta You should have received a copy of the GNU General Public License along with this program. If not, see . */ +#include "config.h" #include #include #include @@ -29,12 +30,254 @@ along with this program. If not, see d */ + "RXO_stcc", /* d = s if cond(s2) */ + "RXO_rtsd", /* rtsd, 1=imm, 2-0 = reg if reg type */ + + /* These are all either d OP= s or, if s2 is set, d = s OP s2. Note + that d may be "None". */ + "RXO_and", + "RXO_or", + "RXO_xor", + "RXO_add", + "RXO_sub", + "RXO_mul", + "RXO_div", + "RXO_divu", + "RXO_shll", + "RXO_shar", + "RXO_shlr", + + "RXO_adc", /* d = d + s + carry */ + "RXO_sbb", /* d = d - s - ~carry */ + "RXO_abs", /* d = |s| */ + "RXO_max", /* d = max(d,s) */ + "RXO_min", /* d = min(d,s) */ + "RXO_emul", /* d:64 = d:32 * s */ + "RXO_emulu", /* d:64 = d:32 * s (unsigned) */ + "RXO_ediv", /* d:64 / s; d = quot, d+1 = rem */ + "RXO_edivu", /* d:64 / s; d = quot, d+1 = rem */ + + "RXO_rolc", /* d <<= 1 through carry */ + "RXO_rorc", /* d >>= 1 through carry*/ + "RXO_rotl", /* d <<= #s without carry */ + "RXO_rotr", /* d >>= #s without carry*/ + "RXO_revw", /* d = revw(s) */ + "RXO_revl", /* d = revl(s) */ + "RXO_branch", /* pc = d if cond(s) */ + "RXO_branchrel",/* pc += d if cond(s) */ + "RXO_jsr", /* pc = d */ + "RXO_jsrrel", /* pc += d */ + "RXO_rts", + "RXO_nop", + "RXO_nop2", + "RXO_nop3", + + "RXO_scmpu", + "RXO_smovu", + "RXO_smovb", + "RXO_suntil", + "RXO_swhile", + "RXO_smovf", + "RXO_sstr", + + "RXO_rmpa", + "RXO_mulhi", + "RXO_mullo", + "RXO_machi", + "RXO_maclo", + "RXO_mvtachi", + "RXO_mvtaclo", + "RXO_mvfachi", + "RXO_mvfacmi", + "RXO_mvfaclo", + "RXO_racw", + + "RXO_sat", /* sat(d) */ + "RXO_satr", + + "RXO_fadd", /* d op= s */ + "RXO_fcmp", + "RXO_fsub", + "RXO_ftoi", + "RXO_fmul", + "RXO_fdiv", + "RXO_round", + "RXO_itof", + + "RXO_bset", /* d |= (1< = cond(s2) */ + + "RXO_clrpsw", /* flag index in d */ + "RXO_setpsw", /* flag index in d */ + "RXO_mvtipl", /* new IPL in s */ + + "RXO_rtfi", + "RXO_rte", + "RXO_rtd", /* undocumented */ + "RXO_brk", + "RXO_dbt", /* undocumented */ + "RXO_int", /* vector id in s */ + "RXO_stop", + "RXO_wait", + + "RXO_sccnd", /* d = cond(s) ? 1 : 0 */ +}; + +static const char * optype_names[] = { + " ", + "#Imm", /* #addend */ + " Rn ", /* Rn */ + "[Rn]", /* [Rn + addend] */ + "Ps++", /* [Rn+] */ + "--Pr", /* [-Rn] */ + " cc ", /* eq, gtu, etc */ + "Flag" /* [UIOSZC] */ +}; + +#define N_RXO (sizeof(id_names)/sizeof(id_names[0])) +#define N_RXT (sizeof(optype_names)/sizeof(optype_names[0])) +#define N_MAP 30 + +static unsigned long long benchmark_start_cycle; +static unsigned long long benchmark_end_cycle; + +static int op_cache[N_RXT][N_RXT][N_RXT]; +static int op_cache_rev[N_MAP]; +static int op_cache_idx = 0; + +static int +op_lookup (int a, int b, int c) +{ + if (op_cache[a][b][c]) + return op_cache[a][b][c]; + op_cache_idx ++; + if (op_cache_idx >= N_MAP) + { + printf("op_cache_idx exceeds %d\n", N_MAP); + exit(1); + } + op_cache[a][b][c] = op_cache_idx; + op_cache_rev[op_cache_idx] = (a<<8) | (b<<4) | c; + return op_cache_idx; +} + +static char * +op_cache_string (int map) +{ + static int ci; + static char cb[5][20]; + int a, b, c; + + map = op_cache_rev[map]; + a = (map >> 8) & 15; + b = (map >> 4) & 15; + c = (map >> 0) & 15; + ci = (ci + 1) % 5; + sprintf(cb[ci], "%s %s %s", optype_names[a], optype_names[b], optype_names[c]); + return cb[ci]; +} + +static unsigned long long cycles_per_id[N_RXO][N_MAP]; +static unsigned long long times_per_id[N_RXO][N_MAP]; +static unsigned long long memory_stalls; +static unsigned long long register_stalls; +static unsigned long long branch_stalls; +static unsigned long long branch_alignment_stalls; +static unsigned long long fast_returns; + +static unsigned long times_per_pair[N_RXO][N_MAP][N_RXO][N_MAP]; +static int prev_opcode_id = RXO_unknown; +static int po0; + +#define STATS(x) x + +#else +#define STATS(x) +#endif /* CYCLE_STATS */ + + +#ifdef CYCLE_ACCURATE + +static int new_rt = -1; + +/* Number of cycles to add if an insn spans an 8-byte boundary. */ +static int branch_alignment_penalty = 0; + +#endif + +static int running_benchmark = 1; + +#define tprintf if (trace && running_benchmark) printf jmp_buf decode_jmp_buf; unsigned int rx_cycles = 0; +#ifdef CYCLE_ACCURATE +/* If nonzero, memory was read at some point and cycle latency might + take effect. */ +static int memory_source = 0; +/* If nonzero, memory was written and extra cycles might be + needed. */ +static int memory_dest = 0; + +static void +cycles (int throughput) +{ + tprintf("%d cycles\n", throughput); + regs.cycle_count += throughput; +} + +/* Number of execution (E) cycles the op uses. For memory sources, we + include the load micro-op stall as two extra E cycles. */ +#define E(c) cycles (memory_source ? c + 2 : c) +#define E1 cycles (1) +#define E2 cycles (2) +#define EBIT cycles (memory_source ? 2 : 1) + +/* Check to see if a read latency must be applied for a given register. */ +#define RL(r) \ + if (regs.rt == r ) \ + { \ + tprintf("register %d load stall\n", r); \ + regs.cycle_count ++; \ + STATS(register_stalls ++); \ + regs.rt = -1; \ + } + +#define RLD(r) \ + if (memory_source) \ + { \ + tprintf ("Rt now %d\n", r); \ + new_rt = r; \ + } + +#else /* !CYCLE_ACCURATE */ + +#define cycles(t) +#define E(c) +#define E1 +#define E2 +#define EBIT +#define RL(r) +#define RLD(r) + +#endif /* else CYCLE_ACCURATE */ + static int size2bytes[] = { 4, 1, 1, 1, 2, 2, 2, 3, 4 }; @@ -53,24 +296,28 @@ _rx_abort (const char *file, int line) abort(); } +static unsigned char *get_byte_base; +static SI get_byte_page; + +/* This gets called a *lot* so optimize it. */ static int rx_get_byte (void *vdata) { - int saved_trace = trace; - unsigned char rv; - - if (trace == 1) - trace = 0; - RX_Data *rx_data = (RX_Data *)vdata; + SI tpc = rx_data->dpc; + + /* See load.c for an explanation of this. */ if (rx_big_endian) - /* See load.c for an explanation of this. */ - rv = mem_get_pc (rx_data->dpc ^ 3); - else - rv = mem_get_pc (rx_data->dpc); + tpc ^= 3; + + if (((tpc ^ get_byte_page) & NONPAGE_MASK) || enable_counting) + { + get_byte_page = tpc & NONPAGE_MASK; + get_byte_base = rx_mem_ptr (get_byte_page, MPA_READING) - get_byte_page; + } + rx_data->dpc ++; - trace = saved_trace; - return rv; + return get_byte_base [tpc]; } static int @@ -88,6 +335,7 @@ get_op (RX_Opcode_Decoded *rd, int i) return o->addend; case RX_Operand_Register: /* Rn */ + RL (o->reg); rv = get_reg (o->reg); break; @@ -96,6 +344,21 @@ get_op (RX_Opcode_Decoded *rd, int i) /* fall through */ case RX_Operand_Postinc: /* [Rn+] */ case RX_Operand_Indirect: /* [Rn + addend] */ +#ifdef CYCLE_ACCURATE + RL (o->reg); + regs.rt = -1; + if (regs.m2m == M2M_BOTH) + { + tprintf("src memory stall\n"); +#ifdef CYCLE_STATS + memory_stalls ++; +#endif + regs.cycle_count ++; + regs.m2m = 0; + } + + memory_source = 1; +#endif addr = get_reg (o->reg) + o->addend; switch (o->size) @@ -234,6 +497,7 @@ put_op (RX_Opcode_Decoded *rd, int i, in case RX_Operand_Register: /* Rn */ put_reg (o->reg, v); + RLD (o->reg); break; case RX_Operand_Predec: /* [-Rn] */ @@ -242,6 +506,19 @@ put_op (RX_Opcode_Decoded *rd, int i, in case RX_Operand_Postinc: /* [Rn+] */ case RX_Operand_Indirect: /* [Rn + addend] */ +#ifdef CYCLE_ACCURATE + if (regs.m2m == M2M_BOTH) + { + tprintf("dst memory stall\n"); + regs.cycle_count ++; +#ifdef CYCLE_STATS + memory_stalls ++; +#endif + regs.m2m = 0; + } + memory_dest = 1; +#endif + addr = get_reg (o->reg) + o->addend; switch (o->size) { @@ -345,8 +622,8 @@ poppc() #define MATH_OP(vop,c) \ { \ - uma = US1(); \ umb = US2(); \ + uma = US1(); \ ll = (unsigned long long) uma vop (unsigned long long) umb vop c; \ tprintf ("0x%x " #vop " 0x%x " #vop " 0x%x = 0x%llx\n", uma, umb, c, ll); \ ma = sign_ext (uma, DSZ() * 8); \ @@ -355,23 +632,25 @@ poppc() tprintf ("%d " #vop " %d " #vop " %d = %lld\n", ma, mb, c, sll); \ set_oszc (sll, DSZ(), (long long) ll > ((1 vop 1) ? (long long) b2mask[DSZ()] : (long long) -1)); \ PD (sll); \ + E (1); \ } #define LOGIC_OP(vop) \ { \ - ma = US1(); \ mb = US2(); \ + ma = US1(); \ v = ma vop mb; \ tprintf("0x%x " #vop " 0x%x = 0x%x\n", ma, mb, v); \ set_sz (v, DSZ()); \ PD(v); \ + E (1); \ } #define SHIFT_OP(val, type, count, OP, carry_mask) \ { \ int i, c=0; \ - val = (type)US1(); \ count = US2(); \ + val = (type)US1(); \ tprintf("%lld " #OP " %d\n", val, count); \ for (i = 0; i < count; i ++) \ { \ @@ -443,8 +722,8 @@ fop_fsub (fp_t s1, fp_t s2, fp_t *d) int do_store; \ fp_t fa, fb, fc; \ FPCLEAR(); \ - fa = GD (); \ fb = GS (); \ + fa = GD (); \ do_store = fop_##func (fa, fb, &fc); \ tprintf("%g " #func " %g = %g %08x\n", int2float(fa), int2float(fb), int2float(fc), fc); \ FPCHECK(); \ @@ -549,6 +828,21 @@ do_fp_exception (unsigned long opcode_pc return RX_MAKE_STEPPED (); } +static int +op_is_memory (RX_Opcode_Decoded *rd, int i) +{ + switch (rd->op[i].type) + { + case RX_Operand_Predec: + case RX_Operand_Postinc: + case RX_Operand_Indirect: + return 1; + default: + return 0; + } +} +#define OM(i) op_is_memory (&opcode, i) + int decode_opcode () { @@ -561,14 +855,46 @@ decode_opcode () RX_Data rx_data; RX_Opcode_Decoded opcode; int rv; +#ifdef CYCLE_STATS + unsigned long long prev_cycle_count; +#endif +#ifdef CYCLE_ACCURATE + int tx; +#endif if ((rv = setjmp (decode_jmp_buf))) return rv; +#ifdef CYCLE_STATS + prev_cycle_count = regs.cycle_count; +#endif + +#ifdef CYCLE_ACCURATE + memory_source = 0; + memory_dest = 0; +#endif + rx_cycles ++; rx_data.dpc = opcode_pc = regs.r_pc; + memset (&opcode, 0, sizeof(opcode)); opcode_size = rx_decode_opcode (opcode_pc, &opcode, rx_get_byte, &rx_data); + +#ifdef CYCLE_ACCURATE + if (branch_alignment_penalty) + { + if ((regs.r_pc ^ (regs.r_pc + opcode_size - 1)) & ~7) + { + tprintf("1 cycle branch alignment penalty\n"); + cycles (branch_alignment_penalty); +#ifdef CYCLE_STATS + branch_alignment_stalls ++; +#endif + } + branch_alignment_penalty = 0; + } +#endif + regs.r_pc += opcode_size; rx_flagmask = opcode.flags_s; @@ -585,6 +911,7 @@ decode_opcode () tprintf("%lld\n", sll); PD (sll); set_osz (sll, 4); + E (1); break; case RXO_adc: @@ -608,6 +935,7 @@ decode_opcode () mb &= 0x07; ma &= ~(1 << mb); PD (ma); + EBIT; break; case RXO_bmcc: @@ -622,6 +950,7 @@ decode_opcode () else ma &= ~(1 << mb); PD (ma); + EBIT; break; case RXO_bnot: @@ -633,16 +962,71 @@ decode_opcode () mb &= 0x07; ma ^= (1 << mb); PD (ma); + EBIT; break; case RXO_branch: if (GS()) - regs.r_pc = GD(); + { +#ifdef CYCLE_ACCURATE + SI old_pc = regs.r_pc; + int delta; +#endif + regs.r_pc = GD(); +#ifdef CYCLE_ACCURATE + delta = regs.r_pc - old_pc; + if (delta >= 0 && delta < 16 + && opcode_size > 1) + { + tprintf("near forward branch bonus\n"); + cycles (2); + } + else + { + cycles (3); + branch_alignment_penalty = 1; + } +#ifdef CYCLE_STATS + branch_stalls ++; + /* This is just for statistics */ + if (opcode.op[1].reg == 14) + opcode.op[1].type = RX_Operand_None; +#endif +#endif + } +#ifdef CYCLE_ACCURATE + else + cycles (1); +#endif break; case RXO_branchrel: if (GS()) - regs.r_pc += GD(); + { + int delta = GD(); + regs.r_pc += delta; +#ifdef CYCLE_ACCURATE + /* Note: specs say 3, chip says 2. */ + if (delta >= 0 && delta < 16 + && opcode_size > 1) + { + tprintf("near forward branch bonus\n"); + cycles (2); + } + else + { + cycles (3); + branch_alignment_penalty = 1; + } +#ifdef CYCLE_STATS + branch_stalls ++; +#endif +#endif + } +#ifdef CYCLE_ACCURATE + else + cycles (1); +#endif break; case RXO_brk: @@ -659,6 +1043,7 @@ decode_opcode () pushpc (old_psw); pushpc (regs.r_pc); regs.r_pc = mem_get_si (regs.r_intb); + cycles(6); } break; @@ -671,6 +1056,7 @@ decode_opcode () mb &= 0x07; ma |= (1 << mb); PD (ma); + EBIT; break; case RXO_btst: @@ -682,6 +1068,7 @@ decode_opcode () mb &= 0x07; umb = ma & (1 << mb); set_zc (! umb, umb); + EBIT; break; case RXO_clrpsw: @@ -691,6 +1078,7 @@ decode_opcode () || v == FLAGBIT_U)) break; regs.r_psw &= ~v; + cycles (1); break; case RXO_div: /* d = d / s */ @@ -709,6 +1097,8 @@ decode_opcode () set_flags (FLAGBIT_O, 0); PD (v); } + /* Note: spec says 3 to 22 cycles, we are pessimistic. */ + cycles (22); break; case RXO_divu: /* d = d / s */ @@ -727,6 +1117,8 @@ decode_opcode () set_flags (FLAGBIT_O, 0); PD (v); } + /* Note: spec says 2 to 20 cycles, we are pessimistic. */ + cycles (20); break; case RXO_ediv: @@ -748,6 +1140,8 @@ decode_opcode () opcode.op[0].reg ++; PD (mb); } + /* Note: spec says 3 to 22 cycles, we are pessimistic. */ + cycles (22); break; case RXO_edivu: @@ -769,6 +1163,8 @@ decode_opcode () opcode.op[0].reg ++; PD (umb); } + /* Note: spec says 2 to 20 cycles, we are pessimistic. */ + cycles (20); break; case RXO_emul: @@ -779,6 +1175,7 @@ decode_opcode () PD (sll); opcode.op[0].reg ++; PD (sll >> 32); + E2; break; case RXO_emulu: @@ -789,10 +1186,12 @@ decode_opcode () PD (ll); opcode.op[0].reg ++; PD (ll >> 32); + E2; break; case RXO_fadd: FLOAT_OP (fadd); + E (4); break; case RXO_fcmp: @@ -801,24 +1200,32 @@ decode_opcode () FPCLEAR (); rxfp_cmp (ma, mb); FPCHECK (); + E (1); break; case RXO_fdiv: FLOAT_OP (fdiv); + E (16); break; case RXO_fmul: FLOAT_OP (fmul); + E (3); break; case RXO_rtfi: PRIVILEDGED (); regs.r_psw = regs.r_bpsw; regs.r_pc = regs.r_bpc; +#ifdef CYCLE_ACCURATE + regs.fast_return = 0; + cycles(3); +#endif break; case RXO_fsub: FLOAT_OP (fsub); + E (4); break; case RXO_ftoi: @@ -829,6 +1236,7 @@ decode_opcode () PD (mb); tprintf("(int) %g = %d\n", int2float(ma), mb); set_sz (mb, 4); + E (2); break; case RXO_int: @@ -845,6 +1253,7 @@ decode_opcode () pushpc (regs.r_pc); regs.r_pc = mem_get_si (regs.r_intb + 4 * v); } + cycles (6); break; case RXO_itof: @@ -855,49 +1264,87 @@ decode_opcode () tprintf("(float) %d = %x\n", ma, mb); PD (mb); set_sz (ma, 4); + E (2); break; case RXO_jsr: case RXO_jsrrel: - v = GD (); - pushpc (get_reg (pc)); - if (opcode.id == RXO_jsrrel) - v += regs.r_pc; - put_reg (pc, v); + { +#ifdef CYCLE_ACCURATE + int delta; + regs.m2m = 0; +#endif + v = GD (); +#ifdef CYCLE_ACCURATE + regs.link_register = regs.r_pc; +#endif + pushpc (get_reg (pc)); + if (opcode.id == RXO_jsrrel) + v += regs.r_pc; +#ifdef CYCLE_ACCURATE + delta = v - regs.r_pc; +#endif + put_reg (pc, v); +#ifdef CYCLE_ACCURATE + /* Note: docs say 3, chip says 2 */ + if (delta >= 0 && delta < 16) + { + tprintf ("near forward jsr bonus\n"); + cycles (2); + } + else + { + branch_alignment_penalty = 1; + cycles (3); + } + regs.fast_return = 1; +#endif + } break; case RXO_machi: ll = (long long)(signed short)(GS() >> 16) * (long long)(signed short)(GS2 () >> 16); ll <<= 16; put_reg64 (acc64, ll + regs.r_acc); + E1; break; case RXO_maclo: ll = (long long)(signed short)(GS()) * (long long)(signed short)(GS2 ()); ll <<= 16; put_reg64 (acc64, ll + regs.r_acc); + E1; break; case RXO_max: - ma = GD(); mb = GS(); + ma = GD(); if (ma > mb) PD (ma); else PD (mb); + E (1); +#ifdef CYCLE_STATS + if (opcode.op[0].type == RX_Operand_Register + && opcode.op[1].type == RX_Operand_Register + && opcode.op[0].reg == opcode.op[1].reg) + opcode.id = RXO_nop3; +#endif break; case RXO_min: - ma = GD(); mb = GS(); + ma = GD(); if (ma < mb) PD (ma); else PD (mb); + E (1); break; case RXO_mov: v = GS (); + if (opcode.op[0].type == RX_Operand_Register && opcode.op[0].reg == 16 /* PSW */) { @@ -927,8 +1374,32 @@ decode_opcode () /* These are ignored. */ break; } + if (OM(0) && OM(1)) + cycles (2); + else + cycles (1); + PD (v); + +#ifdef CYCLE_ACCURATE + if ((opcode.op[0].type == RX_Operand_Predec + && opcode.op[1].type == RX_Operand_Register) + || (opcode.op[0].type == RX_Operand_Postinc + && opcode.op[1].type == RX_Operand_Register)) + { + /* Special case: push reg doesn't cause a memory stall. */ + memory_dest = 0; + tprintf("push special case\n"); + } +#endif + set_sz (v, DSZ()); +#ifdef CYCLE_STATS + if (opcode.op[0].type == RX_Operand_Register + && opcode.op[1].type == RX_Operand_Register + && opcode.op[0].reg == opcode.op[1].reg) + opcode.id = RXO_nop2; +#endif break; case RXO_movbi: @@ -939,6 +1410,7 @@ decode_opcode () opcode.op[1].type = RX_Operand_Indirect; opcode.op[1].addend = 0; PD (GS ()); + cycles (1); break; case RXO_movbir: @@ -949,51 +1421,65 @@ decode_opcode () opcode.op[1].type = RX_Operand_Indirect; opcode.op[1].addend = 0; PS (GD ()); + cycles (1); break; case RXO_mul: - ll = (unsigned long long) US1() * (unsigned long long) US2(); + v = US2 (); + ll = (unsigned long long) US1() * (unsigned long long) v; PD(ll); + E (1); break; case RXO_mulhi: - ll = (long long)(signed short)(GS() >> 16) * (long long)(signed short)(GS2 () >> 16); + v = GS2 (); + ll = (long long)(signed short)(GS() >> 16) * (long long)(signed short)(v >> 16); ll <<= 16; put_reg64 (acc64, ll); + E1; break; case RXO_mullo: - ll = (long long)(signed short)(GS()) * (long long)(signed short)(GS2 ()); + v = GS2 (); + ll = (long long)(signed short)(GS()) * (long long)(signed short)(v); ll <<= 16; put_reg64 (acc64, ll); + E1; break; case RXO_mvfachi: PD (get_reg (acchi)); + E1; break; case RXO_mvfaclo: PD (get_reg (acclo)); + E1; break; case RXO_mvfacmi: PD (get_reg (accmi)); + E1; break; case RXO_mvtachi: put_reg (acchi, GS ()); + E1; break; case RXO_mvtaclo: put_reg (acclo, GS ()); + E1; break; case RXO_mvtipl: regs.r_psw &= ~ FLAGBITS_IPL; regs.r_psw |= (GS () << FLAGSHIFT_IPL) & FLAGBITS_IPL; + E1; break; case RXO_nop: + E1; break; case RXO_or: @@ -1010,11 +1496,11 @@ decode_opcode () return RX_MAKE_STOPPED (SIGILL); } for (v = opcode.op[1].reg; v <= opcode.op[2].reg; v++) - put_reg (v, pop ()); - break; - - case RXO_pusha: - push (get_reg (opcode.op[1].reg) + opcode.op[1].addend); + { + cycles (1); + RLD (v); + put_reg (v, pop ()); + } break; case RXO_pushm: @@ -1027,7 +1513,11 @@ decode_opcode () return RX_MAKE_STOPPED (SIGILL); } for (v = opcode.op[2].reg; v >= opcode.op[1].reg; v--) - push (get_reg (v)); + { + RL (v); + push (get_reg (v)); + } + cycles (opcode.op[2].reg - opcode.op[1].reg + 1); break; case RXO_racw: @@ -1040,6 +1530,7 @@ decode_opcode () else ll &= 0xffffffff00000000ULL; put_reg64 (acc64, ll); + E1; break; case RXO_rte: @@ -1048,6 +1539,10 @@ decode_opcode () regs.r_psw = poppc (); if (FLAG_PM) regs.r_psw |= FLAGBIT_U; +#ifdef CYCLE_ACCURATE + regs.fast_return = 0; + cycles (6); +#endif break; case RXO_revl: @@ -1057,6 +1552,7 @@ decode_opcode () | ((uma << 8) & 0xff0000) | ((uma << 24) & 0xff000000UL)); PD (umb); + E1; break; case RXO_revw: @@ -1064,9 +1560,16 @@ decode_opcode () umb = (((uma >> 8) & 0x00ff00ff) | ((uma << 8) & 0xff00ff00UL)); PD (umb); + E1; break; case RXO_rmpa: + RL(4); + RL(5); +#ifdef CYCLE_ACCURATE + tx = regs.r[3]; +#endif + while (regs.r[3] != 0) { long long tmp; @@ -1124,6 +1627,22 @@ decode_opcode () set_flags (FLAGBIT_O|FLAGBIT_S, ma | FLAGBIT_O); else set_flags (FLAGBIT_O|FLAGBIT_S, ma); +#ifdef CYCLE_ACCURATE + switch (opcode.size) + { + case RX_Long: + cycles (6 + 4 * tx); + break; + case RX_Word: + cycles (6 + 5 * (tx / 2) + 4 * (tx % 2)); + break; + case RX_Byte: + cycles (6 + 7 * (tx / 4) + 4 * (tx % 4)); + break; + default: + abort (); + } +#endif break; case RXO_rolc: @@ -1133,6 +1652,7 @@ decode_opcode () v |= carry; set_szc (v, 4, ma); PD (v); + E1; break; case RXO_rorc: @@ -1142,6 +1662,7 @@ decode_opcode () uma |= (carry ? 0x80000000UL : 0); set_szc (uma, 4, mb); PD (uma); + E1; break; case RXO_rotl: @@ -1154,6 +1675,7 @@ decode_opcode () } set_szc (uma, 4, mb); PD (uma); + E1; break; case RXO_rotr: @@ -1166,6 +1688,7 @@ decode_opcode () } set_szc (uma, 4, mb); PD (uma); + E1; break; case RXO_round: @@ -1176,10 +1699,30 @@ decode_opcode () PD (mb); tprintf("(int) %g = %d\n", int2float(ma), mb); set_sz (mb, 4); + E (2); break; case RXO_rts: - regs.r_pc = poppc (); + { +#ifdef CYCLE_ACCURATE + int cyc = 5; +#endif + regs.r_pc = poppc (); +#ifdef CYCLE_ACCURATE + /* Note: specs say 5, chip says 3. */ + if (regs.fast_return && regs.link_register == regs.r_pc) + { +#ifdef CYCLE_STATS + fast_returns ++; +#endif + tprintf("fast return bonus\n"); + cyc -= 2; + } + cycles (cyc); + regs.fast_return = 0; + branch_alignment_penalty = 1; +#endif + } break; case RXO_rtsd: @@ -1190,12 +1733,39 @@ decode_opcode () put_reg (0, get_reg (0) + GS() - (opcode.op[0].reg-opcode.op[2].reg+1)*4); if (opcode.op[2].reg == 0) EXCEPTION (EX_UNDEFINED); +#ifdef CYCLE_ACCURATE + tx = opcode.op[0].reg - opcode.op[2].reg + 1; +#endif for (i = opcode.op[2].reg; i <= opcode.op[0].reg; i ++) - put_reg (i, pop ()); + { + RLD (i); + put_reg (i, pop ()); + } } else - put_reg (0, get_reg (0) + GS()); - put_reg (pc, poppc ()); + { +#ifdef CYCLE_ACCURATE + tx = 0; +#endif + put_reg (0, get_reg (0) + GS()); + } + put_reg (pc, poppc()); +#ifdef CYCLE_ACCURATE + if (regs.fast_return && regs.link_register == regs.r_pc) + { + tprintf("fast return bonus\n"); +#ifdef CYCLE_STATS + fast_returns ++; +#endif + cycles (tx < 3 ? 3 : tx + 1); + } + else + { + cycles (tx < 5 ? 5 : tx + 1); + } + regs.fast_return = 0; + branch_alignment_penalty = 1; +#endif break; case RXO_sat: @@ -1203,6 +1773,7 @@ decode_opcode () PD (0x7fffffffUL); else if (FLAG_O && ! FLAG_S) PD (0x80000000UL); + E1; break; case RXO_sbb: @@ -1214,9 +1785,13 @@ decode_opcode () PD (1); else PD (0); + E1; break; case RXO_scmpu: +#ifdef CYCLE_ACCURATE + tx = regs.r[3]; +#endif while (regs.r[3] != 0) { uma = mem_get_qi (regs.r[1] ++); @@ -1229,6 +1804,7 @@ decode_opcode () set_zc (1, 1); else set_zc (0, ((int)uma - (int)umb) >= 0); + cycles (2 + 4 * (tx / 4) + 4 * (tx % 4)); break; case RXO_setpsw: @@ -1238,24 +1814,40 @@ decode_opcode () || v == FLAGBIT_U)) break; regs.r_psw |= v; + cycles (1); break; case RXO_smovb: + RL (3); +#ifdef CYCLE_ACCURATE + tx = regs.r[3]; +#endif while (regs.r[3]) { uma = mem_get_qi (regs.r[2] --); mem_put_qi (regs.r[1]--, uma); regs.r[3] --; } +#ifdef CYCLE_ACCURATE + if (tx > 3) + cycles (6 + 3 * (tx / 4) + 3 * (tx % 4)); + else + cycles (2 + 3 * (tx % 4)); +#endif break; case RXO_smovf: + RL (3); +#ifdef CYCLE_ACCURATE + tx = regs.r[3]; +#endif while (regs.r[3]) { uma = mem_get_qi (regs.r[2] ++); mem_put_qi (regs.r[1]++, uma); regs.r[3] --; } + cycles (2 + 3 * (int)(tx / 4) + 3 * (tx % 4)); break; case RXO_smovu: @@ -1271,17 +1863,24 @@ decode_opcode () case RXO_shar: /* d = ma >> mb */ SHIFT_OP (sll, int, mb, >>=, 1); + E (1); break; case RXO_shll: /* d = ma << mb */ SHIFT_OP (ll, int, mb, <<=, 0x80000000UL); + E (1); break; case RXO_shlr: /* d = ma >> mb */ SHIFT_OP (ll, unsigned int, mb, >>=, 1); + E (1); break; case RXO_sstr: + RL (3); +#ifdef CYCLE_ACCURATE + tx = regs.r[3]; +#endif switch (opcode.size) { case RX_Long: @@ -1291,6 +1890,7 @@ decode_opcode () regs.r[1] += 4; regs.r[3] --; } + cycles (2 + tx); break; case RX_Word: while (regs.r[3] != 0) @@ -1299,6 +1899,7 @@ decode_opcode () regs.r[1] += 2; regs.r[3] --; } + cycles (2 + (int)(tx / 2) + tx % 2); break; case RX_Byte: while (regs.r[3] != 0) @@ -1307,6 +1908,7 @@ decode_opcode () regs.r[1] ++; regs.r[3] --; } + cycles (2 + (int)(tx / 4) + tx % 4); break; default: abort (); @@ -1316,6 +1918,7 @@ decode_opcode () case RXO_stcc: if (GS2()) PD (GS ()); + E1; break; case RXO_stop: @@ -1328,8 +1931,15 @@ decode_opcode () break; case RXO_suntil: + RL(3); +#ifdef CYCLE_ACCURATE + tx = regs.r[3]; +#endif if (regs.r[3] == 0) - break; + { + cycles (3); + break; + } switch (opcode.size) { case RX_Long: @@ -1342,6 +1952,7 @@ decode_opcode () if (umb == uma) break; } + cycles (3 + 3 * tx); break; case RX_Word: uma = get_reg (2) & 0xffff; @@ -1353,6 +1964,7 @@ decode_opcode () if (umb == uma) break; } + cycles (3 + 3 * (tx / 2) + 3 * (tx % 2)); break; case RX_Byte: uma = get_reg (2) & 0xff; @@ -1364,6 +1976,7 @@ decode_opcode () if (umb == uma) break; } + cycles (3 + 3 * (tx / 4) + 3 * (tx % 4)); break; default: abort(); @@ -1375,6 +1988,10 @@ decode_opcode () break; case RXO_swhile: + RL(3); +#ifdef CYCLE_ACCURATE + tx = regs.r[3]; +#endif if (regs.r[3] == 0) break; switch (opcode.size) @@ -1389,6 +2006,7 @@ decode_opcode () if (umb != uma) break; } + cycles (3 + 3 * tx); break; case RX_Word: uma = get_reg (2) & 0xffff; @@ -1400,6 +2018,7 @@ decode_opcode () if (umb != uma) break; } + cycles (3 + 3 * (tx / 2) + 3 * (tx % 2)); break; case RX_Byte: uma = get_reg (2) & 0xff; @@ -1411,6 +2030,7 @@ decode_opcode () if (umb != uma) break; } + cycles (3 + 3 * (tx / 4) + 3 * (tx % 4)); break; default: abort(); @@ -1427,9 +2047,18 @@ decode_opcode () return RX_MAKE_STOPPED(0); case RXO_xchg: +#ifdef CYCLE_ACCURATE + regs.m2m = 0; +#endif v = GS (); /* This is the memory operand, if any. */ PS (GD ()); /* and this may change the address register. */ PD (v); + E2; +#ifdef CYCLE_ACCURATE + /* all M cycles happen during xchg's cycles. */ + memory_dest = 0; + memory_source = 0; +#endif break; case RXO_xor: @@ -1440,5 +2069,122 @@ decode_opcode () EXCEPTION (EX_UNDEFINED); } +#ifdef CYCLE_ACCURATE + regs.m2m = 0; + if (memory_source) + regs.m2m |= M2M_SRC; + if (memory_dest) + regs.m2m |= M2M_DST; + + regs.rt = new_rt; + new_rt = -1; +#endif + +#ifdef CYCLE_STATS + if (prev_cycle_count == regs.cycle_count) + { + printf("Cycle count not updated! id %s\n", id_names[opcode.id]); + abort (); + } +#endif + +#ifdef CYCLE_STATS + if (running_benchmark) + { + int omap = op_lookup (opcode.op[0].type, opcode.op[1].type, opcode.op[2].type); + + + cycles_per_id[opcode.id][omap] += regs.cycle_count - prev_cycle_count; + times_per_id[opcode.id][omap] ++; + + times_per_pair[prev_opcode_id][po0][opcode.id][omap] ++; + + prev_opcode_id = opcode.id; + po0 = omap; + } +#endif + return RX_MAKE_STEPPED (); } + +#ifdef CYCLE_STATS +void +reset_pipeline_stats (void) +{ + memset (cycles_per_id, 0, sizeof(cycles_per_id)); + memset (times_per_id, 0, sizeof(times_per_id)); + memory_stalls = 0; + register_stalls = 0; + branch_stalls = 0; + branch_alignment_stalls = 0; + fast_returns = 0; + memset (times_per_pair, 0, sizeof(times_per_pair)); + running_benchmark = 1; + + benchmark_start_cycle = regs.cycle_count; +} + +void +halt_pipeline_stats (void) +{ + running_benchmark = 0; + benchmark_end_cycle = regs.cycle_count; +} +#endif + +void +pipeline_stats (void) +{ +#ifdef CYCLE_STATS + int i, o1; + int p, p1; +#endif + +#ifdef CYCLE_ACCURATE + if (verbose == 1) + { + printf ("cycles: %llu\n", regs.cycle_count); + return; + } + + printf ("cycles: %13s\n", comma (regs.cycle_count)); +#endif + +#ifdef CYCLE_STATS + if (benchmark_start_cycle) + printf ("bmark: %13s\n", comma (benchmark_end_cycle - benchmark_start_cycle)); + + printf("\n"); + for (i = 0; i < N_RXO; i++) + for (o1 = 0; o1 < N_MAP; o1 ++) + if (times_per_id[i][o1]) + printf("%13s %13s %7.2f %s %s\n", + comma (cycles_per_id[i][o1]), + comma (times_per_id[i][o1]), + (double)cycles_per_id[i][o1] / times_per_id[i][o1], + op_cache_string(o1), + id_names[i]+4); + + printf("\n"); + for (p = 0; p < N_RXO; p ++) + for (p1 = 0; p1 < N_MAP; p1 ++) + for (i = 0; i < N_RXO; i ++) + for (o1 = 0; o1 < N_MAP; o1 ++) + if (times_per_pair[p][p1][i][o1]) + { + printf("%13s %s %-9s -> %s %s\n", + comma (times_per_pair[p][p1][i][o1]), + op_cache_string(p1), + id_names[p]+4, + op_cache_string(o1), + id_names[i]+4); + } + + printf("\n"); + printf("%13s memory stalls\n", comma (memory_stalls)); + printf("%13s register stalls\n", comma (register_stalls)); + printf("%13s branches taken (non-return)\n", comma (branch_stalls)); + printf("%13s branch alignment stalls\n", comma (branch_alignment_stalls)); + printf("%13s fast returns\n", comma (fast_returns)); +#endif +} Index: trace.c =================================================================== RCS file: /cvs/src/src/sim/rx/trace.c,v retrieving revision 1.2 diff -p -U3 -r1.2 trace.c --- trace.c 1 Jan 2010 10:03:33 -0000 1.2 +++ trace.c 28 Jul 2010 02:00:19 -0000 @@ -19,6 +19,7 @@ You should have received a copy of the G along with this program. If not, see . */ +#include "config.h" #include #include #include @@ -321,7 +322,13 @@ sim_disasm_one (void) } opbuf[0] = 0; - printf ("\033[33m%06x: ", mypc); +#ifdef CYCLE_ACCURATE + printf ("\033[33m %04u %06x: ", (int)(regs.cycle_count % 10000), mypc); +#else + printf ("\033[33m %06x: ", mypc); + +#endif + max = print_insn_rx (mypc, & info); for (i = 0; i < max; i++)