* Re: (patch) hpjyg09: bcache optimizations [not found] <Pine.LNX.4.10.9911041529510.15357-100000@hpcll168.cup.hp.com> @ 1999-11-04 17:31 ` Andrew Cagney 0 siblings, 0 replies; 7+ messages in thread From: Andrew Cagney @ 1999-11-04 17:31 UTC (permalink / raw) To: Jimmy Guo; +Cc: gdb-patches Jimmy Guo wrote: > > Andrew, > > 1) Not all compilers support the inline keyword (HP C compiler for > example), so it will be a nop change in such cases. Therefore the > macro def might be the only way to optimize this away. BTW, how do you get a HP compiler to inline a function? Will it just do it if -O<very-large-number> is given? In the case of the simulators, the decision was to build a fast correct program when built with GCC and a slower correct program when built with other compilers. > 2) The macro def is duplicating the hash() implementation. I agree that > one implementation is much maintainable. While hash() could use > BCACHE_HASH macro, what about the 1st point you made about debugging > and use of macros? Yes, I know the two comments appear contradictary. I'm assuming the macro is approved. In that case I think a single (un-debuggable?) implementation is better than a macro and function duplicating code. Even having a function with a macro as its body helps - at least you can set breakpoints on that hash function and watch what is being returned. > 3) In the actual use of BCACHE_HASH vs. the hash() call, yes a default > BCACHE_HASH can be provided which calls hash(). However, this would > introduce yet another macro wrapper (to address both 2 and 3 ... one > for the hash implementation -- 2; another for the hash call wrapper > using the macro implementation directly or through an actual hash() > wrapper). True. However, I believe it is more important to avoid littering C files with further: #if ... one variant #else another variant #endif If we did that we need to ensure that each alternatives continued to be compilable. > > > o I'm in too minds over the first argument > > (HASHVAL). I recently spent time eliminating > > several really bad macro definitions that > > abused the fact that they had direct access to > > arguments - changing them to multi-arch functions > > was, er, non-trivial. > > > > However, in this case, the rational and use is > > clean. > > I'm not sure I understand your last point about changing macros to > multi-arch functions. (Er, yes, sorry, my quandry isn't very clear). Some of the old GDB macro's would do things like: #define X(Y) (memset (&(Y), sizeof (Y), 0); (Y).A = 1; (Y).B = 2) I'd still prefer to see retval = BCACHE_HASH (bytes, temp_count). Unfortunatly, that can't be implemented as an ISO-C macro so it won't happen :-) > Since: > a) the macro def is pretty much necessary to make the change > meaningful (see 1), > b) to have one implementation of hash and make the usage clean (see 2 & > 3), > I think we should just use the macro def and blow away hash() altogether. I think there first needs to be a technical decision made about how much of a price we're willing to pay for performance VS maintainability. As you can probably tell, I have a very strong preference for the latter :-) If the decision is to use the macro then the corresponding hash() function should be deleted. enjoy, Andrew From guo@cup.hp.com Thu Nov 04 18:07:00 1999 From: Jimmy Guo <guo@cup.hp.com> To: Andrew Cagney <ac131313@cygnus.com> Cc: gdb-patches@sourceware.cygnus.com Subject: Revision: (patch) hpjyg09: bcache optimizations Date: Thu, 04 Nov 1999 18:07:00 -0000 Message-id: <Pine.LNX.4.10.9911041739340.15357-100000@hpcll168.cup.hp.com> References: <382232AF.7E3ED811@cygnus.com> X-SW-Source: 1999-q4/msg00182.html Content-length: 10394 *** Following my reply to Andrew is the revision of this patch. Patch dependency: hpjyg05 (config/pa/tm-hppa.h) *** >BTW, how do you get a HP compiler to inline a function? Will it just do >it if -O<very-large-number> is given? The HP C compiler will inline functions at +O3 and +O4. And +Oinlinebudget=<n> controls the aggressiveness in inlining: = 100 Default level on inlining. > 100 More aggressive inlining. 2 - 99 Less aggressive inlining. = 1 Only inline if it reduces code size. >> 2) The macro def is duplicating the hash() implementation. I agree that >> one implementation is much maintainable. While hash() could use >> BCACHE_HASH macro, what about the 1st point you made about debugging >> and use of macros? > >Yes, I know the two comments appear contradictary. I'm assuming the >macro is approved. In that case I think a single (un-debuggable?) >implementation is better than a macro and function duplicating code. > >Even having a function with a macro as its body helps - at least you can >set breakpoints on that hash function and watch what is being returned. I look forward to the same (that the use of the macro is approved) :) >If the decision is to use the macro then the corresponding hash() >function should be deleted. I'm attaching a revision of this patch to use the macro only. Given two choices of using only the macro vs. providing a macro and also a function wrapping around the macro, I also prefer the first choice, assuming the decision is to use the macro. Thanks! Here is the revision of the patch: ChangeLog: 1999-11-04 Jimmy Guo <guo@cup.hp.com> * config/pa/tm-hppa.h (BCACHE_ALLOW_DUPLICATES): Declare. * bcache.c (BCACHE_HASH): Define, the macro equivalent of the hash () function this macro is replacing. (lookup_cache): Bypass if ignore_duplicates (new static var) is set; use BCACHE_HASH macro instead of hash() function call; optimize (reduce) memcmp() call. (bcache_ignore_duplicates,bcache_eliminate_duplicates): New functions, to control turning on/off duplicate handling at runtime. * bcache.h (bcache_ignore_duplicates,bcache_eliminate_duplicates): Declare. Index: gdb/config/pa/tm-hppa.h /opt/gnu/bin/diff -r -c -N /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/config/pa/tm-hppa.h gdb/config/pa/tm-hppa.h *** /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/config/pa/tm-hppa.h Tue Nov 2 16:04:12 1999 --- gdb/config/pa/tm-hppa.h Thu Nov 4 13:49:12 1999 *************** *** 811,816 **** --- 811,818 ---- || ((symname)[0] == '$' && isdigit ((symname)[1])) \ )) + #define BCACHE_ALLOW_DUPLICATES + /* Here's how to step off a permanent breakpoint. */ #define SKIP_PERMANENT_BREAKPOINT (hppa_skip_permanent_breakpoint) extern void hppa_skip_permanent_breakpoint (void); Index: gdb/bcache.c /opt/gnu/bin/diff -r -c -N /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.c gdb/bcache.c *** /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.c Thu Nov 4 15:13:44 1999 --- gdb/bcache.c Thu Nov 4 17:56:03 1999 *************** *** 24,62 **** #include "bcache.h" #include "gdb_string.h" /* For memcpy declaration */ ! /* Prototypes for local functions. */ ! ! static unsigned int hash PARAMS ((void *, int)); ! ! static void *lookup_cache PARAMS ((void *, int, int, struct bcache *)); ! /* FIXME: Incredibly simplistic hash generator. Probably way too expensive (consider long strings) and unlikely to have good distribution across hash values for typical input. */ ! static unsigned int ! hash (bytes, count) ! void *bytes; ! int count; ! { ! unsigned int len; ! unsigned long hashval; ! unsigned int c; ! const unsigned char *data = bytes; ! ! hashval = 0; ! len = 0; ! while (count-- > 0) ! { ! c = *data++; ! hashval += c + (c << 17); ! hashval ^= hashval >> 2; ! ++len; ! } ! hashval += len + (len << 17); ! hashval ^= hashval >> 2; ! return (hashval % BCACHE_HASHSIZE); ! } static void * lookup_cache (bytes, count, hashval, bcachep) --- 24,87 ---- #include "bcache.h" #include "gdb_string.h" /* For memcpy declaration */ ! /* CM: The hash function is called a LOT, so have a an inlined version. ! This should help scaling to larger apps. */ ! /* guo: Replaced hash() implementation with the equivalent BCACHE_HASH ! macro definition, since having two implementations (macro vs. function) ! of the same functionality is harder to maintain, and specific function ! inlining by compiler is not supported by all compilers (i.e. the use of ! 'inline' keyword supported by GCC). */ /* FIXME: Incredibly simplistic hash generator. Probably way too expensive (consider long strings) and unlikely to have good distribution across hash values for typical input. */ ! #define BCACHE_HASH(retval, bytes, temp_count) \ ! { \ ! int bcache_hash_count = (temp_count); \ ! unsigned int bcache_hash_len; \ ! unsigned long bcache_hash_hashval; \ ! unsigned int bcache_hash_c; \ ! const unsigned char *bcache_hash_data = (bytes); \ ! \ ! bcache_hash_hashval = 0; \ ! bcache_hash_len = 0; \ ! while (bcache_hash_count-- > 0) \ ! { \ ! bcache_hash_c = *bcache_hash_data++; \ ! bcache_hash_hashval += bcache_hash_c + (bcache_hash_c << 17); \ ! bcache_hash_hashval ^= bcache_hash_hashval >> 2; \ ! ++bcache_hash_len; \ ! } \ ! bcache_hash_hashval += bcache_hash_len + (bcache_hash_len << 17); \ ! bcache_hash_hashval ^= bcache_hash_hashval >> 2; \ ! (retval) = (bcache_hash_hashval % BCACHE_HASHSIZE); \ ! } ! ! /* Prototypes for local functions. */ ! ! static void *lookup_cache PARAMS ((void *, int, int, struct bcache *)); ! ! /* srikanth, 990314, JAGaa80452, the bcache seems to be an extremely high ! overhead data structure. See that the overhead (which is really every ! byte that is not used to store GDB's data i.e., cells used for ! housekeeping info like pointers, hash chain heads etc.,) is of the order ! of O(m * n * 64k) where m is the number of load modules compiled with -g, ! and n is the number of strings that have unique length. This spells doom ! for applications wih a large number of shared libraries (m increases) ! and C++ (n increases since we also stick demangled names into the cache.) ! When debugging HP's C compiler more than 140 MB or about 48% of memory is ! due to the bcache overhead. Not the data just the overhead ! ! ! Further research, communication with Cygnus and Fred Fish (the original ! implementor) shows that this module is extremely effective saving as much ! as 60% in gcc + stabs + linux combination. So I am disabling it just for ! the Wildebeest and only for non-doom mode. */ ! ! #ifdef BCACHE_ALLOW_DUPLICATES ! static int ignore_duplicates = 1; ! #else ! static int ignore_duplicates = 0; ! #endif static void * lookup_cache (bytes, count, hashval, bcachep) *************** *** 69,81 **** struct hashlink **hashtablep; struct hashlink *linkp; hashtablep = bcachep->indextable[count]; if (hashtablep != NULL) { linkp = hashtablep[hashval]; while (linkp != NULL) { ! if (memcmp (BCACHE_DATA (linkp), bytes, count) == 0) { location = BCACHE_DATA (linkp); break; --- 94,118 ---- struct hashlink **hashtablep; struct hashlink *linkp; + if (ignore_duplicates) /* don't care if it exists already */ + return NULL; + hashtablep = bcachep->indextable[count]; if (hashtablep != NULL) { linkp = hashtablep[hashval]; while (linkp != NULL) { ! /* CM: The following helps to avoid excessive (possibly shared ! library--i.e. indirect) function calls. This function is ! called a LOT, so this should help scaling to larger apps. ! ! The original line was: ! if (memcmp (BCACHE_DATA (linkp), bytes, count) == 0) */ ! ! if ((*((char *) (BCACHE_DATA (linkp))) == *((char *) bytes)) ! ? (memcmp (BCACHE_DATA (linkp), bytes, count) == 0) ! : 0) { location = BCACHE_DATA (linkp); break; *************** *** 108,114 **** } else { ! hashval = hash (bytes, count); location = lookup_cache (bytes, count, hashval, bcachep); if (location != NULL) { --- 145,152 ---- } else { ! BCACHE_HASH (hashval, bytes, count); ! location = lookup_cache (bytes, count, hashval, bcachep); if (location != NULL) { *************** *** 139,144 **** --- 177,195 ---- return (location); } + /* srikanth, 990323, now we can toggle the bcache behavior on the fly */ + void + bcache_ignore_duplicates () + { + ignore_duplicates = 1; + } + + void + bcache_eliminate_duplicates () + { + ignore_duplicates = 0; + } + void print_bcache_statistics (bcachep, id) struct bcache *bcachep; *************** *** 146,152 **** { struct hashlink **hashtablep; struct hashlink *linkp; ! int tidx, tcount, hidx, hcount, lcount, lmax, temp, lmaxt, lmaxh; for (lmax = lcount = tcount = hcount = tidx = 0; tidx < BCACHE_MAXLENGTH; tidx++) { --- 197,203 ---- { struct hashlink **hashtablep; struct hashlink *linkp; ! int tidx, tcount, hidx, hcount, lcount, lmax, temp, lmaxt = 0, lmaxh = 0; for (lmax = lcount = tcount = hcount = tidx = 0; tidx < BCACHE_MAXLENGTH; tidx++) { Index: gdb/bcache.h /opt/gnu/bin/diff -r -c -N /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.h gdb/bcache.h *** /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.h Thu Nov 4 11:11:13 1999 --- gdb/bcache.h Thu Nov 4 13:49:11 1999 *************** *** 68,73 **** --- 68,79 ---- bcache PARAMS ((void *bytes, int count, struct bcache * bcachep)); extern void + bcache_ignore_duplicates PARAMS (()); + + extern void + bcache_eliminate_duplicates PARAMS (()); + + extern void print_bcache_statistics PARAMS ((struct bcache *, char *)); #endif /* BCACHE_H */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* (patch) hpjyg09: bcache optimizations
@ 1999-11-04 13:48 Jimmy Guo
1999-11-05 9:59 ` Jim Blandy
0 siblings, 1 reply; 7+ messages in thread
From: Jimmy Guo @ 1999-11-04 13:48 UTC (permalink / raw)
To: gdb-patches
***
Patch dependency: hpjyg05 (config/pa/tm-hppa.h)
***
Couple of bcache.c optimizations:
1) Introduction and use of inlined hash function, and attempt to reduce
the memcmp call.
2) Ability to bypass bcache duplicate handling (initial behavior
controlled by a target dependent macro def), and to adjust such
behavior at runtime through two convenience functions.
ChangeLog:
1999-11-04 Jimmy Guo <guo@cup.hp.com>
* config/pa/tm-hppa.h (BCACHE_ALLOW_DUPLICATES): Declare.
* bcache.c (BCACHE_USE_INLINED_HASH): Define, enables use of
the macro equivalent of the hash () function instead.
(BCACHE_HASH): Define, the macro equivalent of the hash () function.
(lookup_cache): Bypass if ignore_duplicates (new static var) is
set; use BCACHE_HASH macro ifdef BCACHE_USE_INLINED_HASH;
optimize (reduce) memcmp() call.
(bcache_ignore_duplicates,bcache_eliminate_duplicates): New
functions, to control turning on/off duplicate handling at
runtime.
* bcache.h (bcache_ignore_duplicates,bcache_eliminate_duplicates):
Declare.
Index: gdb/config/pa/tm-hppa.h
/opt/gnu/bin/diff -r -c -N /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/config/pa/tm-hppa.h gdb/config/pa/tm-hppa.h
*** /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/config/pa/tm-hppa.h Tue Nov 2 16:04:12 1999
--- gdb/config/pa/tm-hppa.h Thu Nov 4 13:28:28 1999
***************
*** 811,816 ****
--- 811,818 ----
|| ((symname)[0] == '$' && isdigit ((symname)[1])) \
))
+ #define BCACHE_ALLOW_DUPLICATES
+
/* Here's how to step off a permanent breakpoint. */
#define SKIP_PERMANENT_BREAKPOINT (hppa_skip_permanent_breakpoint)
extern void hppa_skip_permanent_breakpoint (void);
Index: gdb/bcache.c
/opt/gnu/bin/diff -r -c -N /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.c gdb/bcache.c
*** /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.c Thu Nov 4 10:57:42 1999
--- gdb/bcache.c Thu Nov 4 13:28:20 1999
***************
*** 24,35 ****
--- 24,71 ----
#include "bcache.h"
#include "gdb_string.h" /* For memcpy declaration */
+ /* CM: The hash function is called a LOT, so have a an inlined version.
+ This should help scaling to larger apps. Comment out the following
+ define to use the old method. */
+
+ #define BCACHE_USE_INLINED_HASH
+
/* Prototypes for local functions. */
+ #ifndef BCACHE_USE_INLINED_HASH
static unsigned int hash PARAMS ((void *, int));
+ #endif
static void *lookup_cache PARAMS ((void *, int, int, struct bcache *));
+ #ifdef BCACHE_USE_INLINED_HASH
+
+ /* CM: The hash method should exactly match the function defined below. */
+
+ #define BCACHE_HASH(retval, bytes, temp_count) \
+ { \
+ int bcache_hash_count = (temp_count); \
+ unsigned int bcache_hash_len; \
+ unsigned long bcache_hash_hashval; \
+ unsigned int bcache_hash_c; \
+ const unsigned char *bcache_hash_data = (bytes); \
+ \
+ bcache_hash_hashval = 0; \
+ bcache_hash_len = 0; \
+ while (bcache_hash_count-- > 0) \
+ { \
+ bcache_hash_c = *bcache_hash_data++; \
+ bcache_hash_hashval += bcache_hash_c + (bcache_hash_c << 17); \
+ bcache_hash_hashval ^= bcache_hash_hashval >> 2; \
+ ++bcache_hash_len; \
+ } \
+ bcache_hash_hashval += bcache_hash_len + (bcache_hash_len << 17); \
+ bcache_hash_hashval ^= bcache_hash_hashval >> 2; \
+ (retval) = (bcache_hash_hashval % BCACHE_HASHSIZE); \
+ }
+
+ #else
+
/* FIXME: Incredibly simplistic hash generator. Probably way too expensive
(consider long strings) and unlikely to have good distribution across hash
values for typical input. */
***************
*** 58,63 ****
--- 94,123 ----
return (hashval % BCACHE_HASHSIZE);
}
+ #endif /* BCACHE_USE_INLINED_HASH */
+
+ /* srikanth, 990314, JAGaa80452, the bcache seems to be an extremely high
+ overhead data structure. See that the overhead (which is really every
+ byte that is not used to store GDB's data i.e., cells used for
+ housekeeping info like pointers, hash chain heads etc.,) is of the order
+ of O(m * n * 64k) where m is the number of load modules compiled with -g,
+ and n is the number of strings that have unique length. This spells doom
+ for applications wih a large number of shared libraries (m increases)
+ and C++ (n increases since we also stick demangled names into the cache.)
+ When debugging HP's C compiler more than 140 MB or about 48% of memory is
+ due to the bcache overhead. Not the data just the overhead !
+
+ Further research, communication with Cygnus and Fred Fish (the original
+ implementor) shows that this module is extremely effective saving as much
+ as 60% in gcc + stabs + linux combination. So I am disabling it just for
+ the Wildebeest and only for non-doom mode. */
+
+ #ifdef BCACHE_ALLOW_DUPLICATES
+ static int ignore_duplicates = 1;
+ #else
+ static int ignore_duplicates = 0;
+ #endif
+
static void *
lookup_cache (bytes, count, hashval, bcachep)
void *bytes;
***************
*** 69,81 ****
struct hashlink **hashtablep;
struct hashlink *linkp;
hashtablep = bcachep->indextable[count];
if (hashtablep != NULL)
{
linkp = hashtablep[hashval];
while (linkp != NULL)
{
! if (memcmp (BCACHE_DATA (linkp), bytes, count) == 0)
{
location = BCACHE_DATA (linkp);
break;
--- 129,153 ----
struct hashlink **hashtablep;
struct hashlink *linkp;
+ if (ignore_duplicates) /* don't care if it exists already */
+ return NULL;
+
hashtablep = bcachep->indextable[count];
if (hashtablep != NULL)
{
linkp = hashtablep[hashval];
while (linkp != NULL)
{
! /* CM: The following helps to avoid excessive (possibly shared
! library--i.e. indirect) function calls. This function is
! called a LOT, so this should help scaling to larger apps.
!
! The original line was:
! if (memcmp (BCACHE_DATA (linkp), bytes, count) == 0) */
!
! if ((*((char *) (BCACHE_DATA (linkp))) == *((char *) bytes))
! ? (memcmp (BCACHE_DATA (linkp), bytes, count) == 0)
! : 0)
{
location = BCACHE_DATA (linkp);
break;
***************
*** 108,114 ****
--- 180,192 ----
}
else
{
+ /* CM: Use the inlined hash function if requested. */
+ #ifdef BCACHE_USE_INLINED_HASH
+ BCACHE_HASH (hashval, bytes, count);
+ #else
hashval = hash (bytes, count);
+ #endif
+
location = lookup_cache (bytes, count, hashval, bcachep);
if (location != NULL)
{
***************
*** 139,144 ****
--- 217,235 ----
return (location);
}
+ /* srikanth, 990323, now we can toggle the bcache behavior on the fly */
+ void
+ bcache_ignore_duplicates ()
+ {
+ ignore_duplicates = 1;
+ }
+
+ void
+ bcache_eliminate_duplicates ()
+ {
+ ignore_duplicates = 0;
+ }
+
void
print_bcache_statistics (bcachep, id)
struct bcache *bcachep;
***************
*** 146,152 ****
{
struct hashlink **hashtablep;
struct hashlink *linkp;
! int tidx, tcount, hidx, hcount, lcount, lmax, temp, lmaxt, lmaxh;
for (lmax = lcount = tcount = hcount = tidx = 0; tidx < BCACHE_MAXLENGTH; tidx++)
{
--- 237,243 ----
{
struct hashlink **hashtablep;
struct hashlink *linkp;
! int tidx, tcount, hidx, hcount, lcount, lmax, temp, lmaxt = 0, lmaxh = 0;
for (lmax = lcount = tcount = hcount = tidx = 0; tidx < BCACHE_MAXLENGTH; tidx++)
{
Index: gdb/bcache.h
/opt/gnu/bin/diff -r -c -N /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.h gdb/bcache.h
*** /view/guo.wdb.c//CLO/Components/WDB/Src/gnu/gdb/bcache.h Thu Nov 4 11:11:13 1999
--- gdb/bcache.h Thu Nov 4 11:12:53 1999
***************
*** 68,73 ****
--- 68,79 ----
bcache PARAMS ((void *bytes, int count, struct bcache * bcachep));
extern void
+ bcache_ignore_duplicates PARAMS (());
+
+ extern void
+ bcache_eliminate_duplicates PARAMS (());
+
+ extern void
print_bcache_statistics PARAMS ((struct bcache *, char *));
#endif /* BCACHE_H */
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: (patch) hpjyg09: bcache optimizations 1999-11-04 13:48 Jimmy Guo @ 1999-11-05 9:59 ` Jim Blandy 1999-11-05 10:50 ` Srikanth Adayapalam 1999-12-16 0:26 ` Jeffrey A Law 0 siblings, 2 replies; 7+ messages in thread From: Jim Blandy @ 1999-11-05 9:59 UTC (permalink / raw) To: Jimmy Guo; +Cc: gdb-patches The first red flag here is that this patch adds a new target macro, BCACHE_ALLOW_DUPLICATES (in config/pa/tm-hppa.h), which is actually dependent on the application being debugged, not the architecture. Whether the bcache helps you depends on the contents of your debug info, not whether it's running on a PA microprocessor. Hmm. I'd like to look at this problem on a larger scale: I've read Srikanth's comment in the patch, explaining why the bcache imposes too much overhead for some code while it is still quite effective for other code. But this situation still sounds weird. The bcache is a very simple thing --- bcache.h and bcache.c total 289 lines of code. You give it a string of bytes, and it either adds a copy of your string to its hash table, or finds an identical copy already present. Either way, it hands you back a pointer to its stashed copy. So, a bcache is helpful if you expect a lot of duplicates. In Linux's libc.so, the bcache reduces the space required for the psymtabs by half. What I can't understand is why such a simple data structure needs to have such a terrible overhead. Srikanth writes: See that the overhead (which is really every byte that is not used to store GDB's data i.e., cells used for housekeeping info like pointers, hash chain heads etc.,) is of the order of O(m * n * 64k) where m is the number of load modules compiled with -g, and n is the number of strings that have unique length. This spells doom for applications wih a large number of shared libraries (m increases) and C++ (n increases since we also stick demangled names into the cache.) When debugging HP's C compiler more than 140 MB or about 48% of memory is due to the bcache overhead. Not the data just the overhead ! I'm sure that's true, but it seems completely unnecessary. One should be able to design a bcache with much less overhead. Looking at the output of "maint print statistics" from running GDB on itself, I see: Average hash table population: 8% Average hash table population: (not applicable) Average hash table population: (not applicable) Average hash table population: 0% Average hash table population: 1% Average hash table population: 3% Average hash table population: 1% (The "not applicable" items are for bcaches with no entries.) I don't think those lower-level hash tables are being used very effectively, if the hash function is distributing the items across ~3% of the buckets. A good hash function distributes items across as many different buckets as possible. I would guess that the bcache has never been tuned. So, while I understand the impulse to just disable the bcache, because it's causing you trouble in your driving uses of the debugger, I think the better solution, for both Cygnus and HP, is to fix the data structure so it actually works. I would recommend: - using a single-level hash table, with an initial size of about 32k buckets (based on looking at GDB debugging GDB) - growing the hash table when the average chain length grows beyond a certain limit, so the time overhead remains the same as the problem size grows - choosing a hash function by running GDB on your C compiler, doing "maint print statistics" to see how it's distributing items across buckets, and then tweaking the function to minimize your maximum chain length (this is kind of fun, and takes less time than you'd think, since hash functions have almost no *required* semantics) - adding code to print_bcache_statistics to print the number of bytes spend on bcache overhead Yes, this is more work than commenting it out. But we can't perpetually choose the quick fix over the real cure. Especially when the quick fix adds complexity, like more CPP conditionals and target parameters that aren't target parameters. In this light, I don't think we should apply this patch. From guo@cup.hp.com Fri Nov 05 10:24:00 1999 From: Jimmy Guo <guo@cup.hp.com> To: Jim Blandy <jimb@cygnus.com> Cc: gdb-patches@sourceware.cygnus.com Subject: Re: (patch) hpjyg09: bcache optimizations Date: Fri, 05 Nov 1999 10:24:00 -0000 Message-id: <Pine.LNX.4.10.9911051016470.11755-100000@hpcll168.cup.hp.com> References: <npaeoskevw.fsf@zwingli.cygnus.com> X-SW-Source: 1999-q4/msg00189.html Content-length: 462 Given the questions and concerns over this patch, I will 'hash' it a bit more in the direction as suggested here ... just ignore this patch. - Jimmy Guo >Yes, this is more work than commenting it out. But we can't >perpetually choose the quick fix over the real cure. Especially when >the quick fix adds complexity, like more CPP conditionals and target >parameters that aren't target parameters. > >In this light, I don't think we should apply this patch. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (patch) hpjyg09: bcache optimizations 1999-11-05 9:59 ` Jim Blandy @ 1999-11-05 10:50 ` Srikanth Adayapalam 1999-11-05 13:29 ` Jim Blandy 1999-12-16 0:26 ` Jeffrey A Law 1 sibling, 1 reply; 7+ messages in thread From: Srikanth Adayapalam @ 1999-11-05 10:50 UTC (permalink / raw) To: jimb; +Cc: gdb-patches > > > The first red flag here is that this patch adds a new target macro, > BCACHE_ALLOW_DUPLICATES (in config/pa/tm-hppa.h), which is actually > dependent on the application being debugged, not the architecture. > Whether the bcache helps you depends on the contents of your debug > info, not whether it's running on a PA microprocessor. Hmm. While it is true that it is not tied to the microprocessor, (and the placement of the macro is not ideal perhaps,) it is tied somewhat to the HP-UX platform in that : o The bcache is beneficial, when there is high degree of duplication and redundancy in the debug info generated by a set of compilers for a platform (e.g : stabs + gcc + linux.) o and is useless and high overhead item , if there is little or no duplication (e.g : som + HP cc + HP-UX). On HP-UX systems, we have a tool called pxdb, which is a linker post processor (or a debugger preprocessor if you want to look at it that way) which runs thro the a.out and removes all duplicates. So when gdb enters the picture there is no duplicate debug info at all. Further this overhead is not particular to the C compiler, it is seen for all applications compiled with HP compilers. The C compiler's case just happened to be mentioned. That we waste 140 MB trying to eliminate the non-existent dups was a big concern for us. Hope this clarifies the picture. As for how to proceed with the patch, I'll leave it to Jimmy and this forum to evolve the best course. Thanks Srikanth From jtc@redback.com Fri Nov 05 11:17:00 1999 From: jtc@redback.com (J.T. Conklin) To: Jim Blandy <jimb@cygnus.com> Cc: Jimmy Guo <guo@cup.hp.com>, gdb-patches@sourceware.cygnus.com Subject: Re: (patch) hpjyg09: bcache optimizations Date: Fri, 05 Nov 1999 11:17:00 -0000 Message-id: <5mwvrw92gx.fsf@jtc.redbacknetworks.com> References: <Pine.LNX.4.10.9911041332020.15719-100000@hpcll168.cup.hp.com> <npaeoskevw.fsf@zwingli.cygnus.com> X-SW-Source: 1999-q4/msg00191.html Content-length: 2247 >>>>> "Jim" == Jim Blandy <jimb@cygnus.com> writes: Jim> But this situation still sounds weird. The bcache is a very Jim> simple thing --- bcache.h and bcache.c total 289 lines of code. Jim> You give it a string of bytes, and it either adds a copy of your Jim> string to its hash table, or finds an identical copy already Jim> present. Either way, it hands you back a pointer to its stashed Jim> copy. So, a bcache is helpful if you expect a lot of duplicates. This brings back memories. Many years ago, I discovered CVS had terrible hash behavior. Date and Revision strings were used as keys, and each character (of which there were only digits, '.', and '/') was weighted equally. I can't recall exactly, but I think the original hash function resulted in ~10% bucket utilization --- the remaining ~90% would not be used by any valid keys. And even among the remaning ~10%, the distribution was non-uniform. I changed it to one of the functions compared in the Red Dragon book's discussion of hash functions and their suitability for use for hashing identifiers (which tend to have a lot of duplication/redundency). This change resulted in a uniform distribution across all buckets and a cooresponding performance improvement. For the longest time, I used to keep a histogram plot of the original and new hash functions in a folder, and would show them to anyone who would listen to the tale: switching from a nieve to a complex data structure will not automatically result in a performance improvement. So I agree with Jim, fixing bcache to use a better hash table is a much better approach than microoptimizing the existing hash function by making it an inline function or a macro. Many thanks for spending the time and tracking down the root cause. For reference, here is hash function I used: unsigned int hashpjw(const char* x) { unsigned int h = 0; unsigned int g; while (*x != 0) { h = (h << 4) + *x++; if ((g = h & 0xf0000000) != 0) h = (h ^ (g >> 24)) ^ g; } return h; } It will have to be modified slightly for use in bcache, but it may result in acceptable behavior in this case as well. --jtc -- J.T. Conklin RedBack Networks From tromey@cygnus.com Fri Nov 05 12:00:00 1999 From: Tom Tromey <tromey@cygnus.com> To: gdb-patches@sourceware.cygnus.com Subject: Re: (patch) hpjyg09: bcache optimizations Date: Fri, 05 Nov 1999 12:00:00 -0000 Message-id: <87u2n0lnbn.fsf@cygnus.com> References: <Pine.LNX.4.10.9911041332020.15719-100000@hpcll168.cup.hp.com> <npaeoskevw.fsf@zwingli.cygnus.com> X-SW-Source: 1999-q4/msg00192.html Content-length: 427 >>>>> "Jim" == Jim Blandy <jimb@cygnus.com> writes: Jim> - growing the hash table when the average chain length grows beyond Jim> a certain limit, so the time overhead remains the same as the Jim> problem size grows Greg McGary recently pointed out to me that libiberty now contains a hash table implementation. You still have to write the hash function, but the growing, etc, are handled automatically. Tom From guo@cup.hp.com Fri Nov 05 12:38:00 1999 From: Jimmy Guo <guo@cup.hp.com> To: Srikanth Adayapalam <srikanth@cup.hp.com> Cc: jimb@cygnus.com, gdb-patches@sourceware.cygnus.com Subject: Re: (patch) hpjyg09: bcache optimizations Date: Fri, 05 Nov 1999 12:38:00 -0000 Message-id: <Pine.LNX.4.10.9911051142210.12517-100000@hpcll168.cup.hp.com> References: <199911051850.KAA28555@flytrap.cup.hp.com> X-SW-Source: 1999-q4/msg00193.html Content-length: 1485 Given Jim Blandy's comments: http://sourceware.cygnus.com/ml/gdb-patches/1999-q4/msg00188.html and Srikanth's clarification: http://sourceware.cygnus.com/ml/gdb-patches/1999-q4/msg00190.html I think we should look at the issue of bypassing and the issue of bcache hash implementation separately. Bcache is introduced to deal with duplicate symbols. Where it is not necessary, it just adds time and space overheads, and that could be significant for HP customers debugging large applications. The tuning of bcache hash is probably necessary, but that's a general and separate issue. I'd like to tackle the bypass scheme initially to introduce an acceptable bypass mechanism which gives gdb users with platform compiler combintations like HP-UX + HP cc compiler a faster / smaller footprint gdb for large apps -- Controlling the bypass through a target dependent macro def is out. The bcache_ignore_duplicates() and bcache_filter_duplicates() calls still need to be added, and be used to bypass bcache when detecting that the HP compiler is used (and if gcc is used, gdb will still use bcache). Comments? As to the macro implementation of hash(), that is a smaller issue and will be looked at with some more performance benchmarking to answer whether or not "Inlining such a relatively large body of code is definitely not a definite win". Again, the focus will be debugging large applications, as typical for HP customers. - Jimmy Guo, guo@cup.hp.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (patch) hpjyg09: bcache optimizations 1999-11-05 10:50 ` Srikanth Adayapalam @ 1999-11-05 13:29 ` Jim Blandy 1999-12-15 1:16 ` Jeffrey A Law 0 siblings, 1 reply; 7+ messages in thread From: Jim Blandy @ 1999-11-05 13:29 UTC (permalink / raw) To: Srikanth Adayapalam; +Cc: gdb-patches > While it is true that it is not tied to the microprocessor, > (and the placement of the macro is not ideal perhaps,) it is tied > somewhat to the HP-UX platform in that : Ahh. That, I didn't know. So it would be correct to put the flag in a file associated with a particular toolchain. > o The bcache is beneficial, when there is high degree of > duplication and redundancy in the debug info generated > by a set of compilers for a platform (e.g : stabs + gcc + linux.) > > o and is useless and high overhead item , if there is little > or no duplication (e.g : som + HP cc + HP-UX). > > > On HP-UX systems, we have a tool called pxdb, which is a linker > post processor (or a debugger preprocessor if you want to look at it > that way) which runs thro the a.out and removes all duplicates. So when > gdb enters the picture there is no duplicate debug info at all. Further > this overhead is not particular to the C compiler, it is seen for all > applications compiled with HP compilers. The C compiler's case just > happened to be mentioned. That we waste 140 MB trying to eliminate the > non-existent dups was a big concern for us. It's a big concern for us, too --- I'll explain why. The memory consumed by the bcache depends only on the number of unique strings it contains --- not on the presence or absence of duplicates. Using pxdb has no effect on the amount of bcache memory overhead. So if HP is having problems with bcache memory overhead, others will have problems, too. It's not a toolchain-specific problem. But disabling it is a toolchain-specific fix. From guo@cup.hp.com Fri Nov 05 13:36:00 1999 From: Jimmy Guo <guo@cup.hp.com> To: gdb-patches@sourceware.cygnus.com Subject: (patch) hpjyg13: dejagnu/lib/framework.exp Date: Fri, 05 Nov 1999 13:36:00 -0000 Message-id: <Pine.LNX.4.10.9911051330510.12973-100000@hpcll168.cup.hp.com> X-SW-Source: 1999-q4/msg00197.html Content-length: 2049 Extends setup_xfail PRMS handling. See ChangeLog. ChangeLog: 1999-11-05 Jimmy Guo <guo@cup.hp.com> * framework.exp (setup_xfail): extend PRMS number pattern to anything not containing '-' (target triplet spec). This enables cleaner and more precise xfail message specification where target specific 'PRMS' numbers are specified in corresponding setup_xfail call instead of in the literal fail string (e.g. HP's bug tracking numbers associated with HP compilers on HP targets). Index: dejagnu/lib/framework.exp /opt/gnu/bin/diff -r -c -N /view/guo.import//CLO/Components/WDB/Src/gnu/dejagnu/lib/framework.exp dejagnu/lib/framework.exp *** /view/guo.import//CLO/Components/WDB/Src/gnu/dejagnu/lib/framework.exp Tue Jun 22 22:46:11 1999 --- dejagnu/lib/framework.exp Fri Jul 16 13:30:44 1999 *************** *** 417,424 **** # Setup a flag to control whether a failure is expected or not # # Multiple target triplet patterns can be specified for targets ! # for which the test fails. A decimal number can be specified, ! # which is the PRMS number. # proc setup_xfail { args } { global xfail_flag --- 417,424 ---- # Setup a flag to control whether a failure is expected or not # # Multiple target triplet patterns can be specified for targets ! # for which the test fails. A bug report ID can be specified, ! # which is a string without '-'. # proc setup_xfail { args } { global xfail_flag *************** *** 428,435 **** set argc [ llength $args ] for { set i 0 } { $i < $argc } { incr i } { set sub_arg [ lindex $args $i ] ! # is a prms number. we assume this is a number with no characters ! if [regexp "^\[0-9\]+$" $sub_arg] { set xfail_prms $sub_arg continue } --- 428,435 ---- set argc [ llength $args ] for { set i 0 } { $i < $argc } { incr i } { set sub_arg [ lindex $args $i ] ! # is a prms number. we assume this is a string with no '-' characters ! if [regexp "^\[^\-\]+$" $sub_arg] { set xfail_prms $sub_arg continue } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (patch) hpjyg09: bcache optimizations 1999-11-05 13:29 ` Jim Blandy @ 1999-12-15 1:16 ` Jeffrey A Law 0 siblings, 0 replies; 7+ messages in thread From: Jeffrey A Law @ 1999-12-15 1:16 UTC (permalink / raw) To: Jim Blandy; +Cc: Srikanth Adayapalam, gdb-patches In message < npyacciqf8.fsf@zwingli.cygnus.com >you write: > > > While it is true that it is not tied to the microprocessor, > > (and the placement of the macro is not ideal perhaps,) it is tied > > somewhat to the HP-UX platform in that : > > Ahh. That, I didn't know. So it would be correct to put the flag in > a file associated with a particular toolchain. Not in this case. While HP's tools arrange (via pxdb) to avoid duplicates, other HP toolchains do not. Short term none of the GNU toolchains on the PA remove duplicates. That will likely change for the PA64 toolchain (dwarf2 + elf), but it is unlikely to ever change for the PA32 toolchain (stabs + som). In general, if we find ourselves tuning towards one particular toolchain, then we are probably going down the wrong road. jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: (patch) hpjyg09: bcache optimizations 1999-11-05 9:59 ` Jim Blandy 1999-11-05 10:50 ` Srikanth Adayapalam @ 1999-12-16 0:26 ` Jeffrey A Law 1 sibling, 0 replies; 7+ messages in thread From: Jeffrey A Law @ 1999-12-16 0:26 UTC (permalink / raw) To: Jim Blandy; +Cc: Jimmy Guo, gdb-patches In message < npaeoskevw.fsf@zwingli.cygnus.com >you write: > So, while I understand the impulse to just disable the bcache, because > it's causing you trouble in your driving uses of the debugger, I think > the better solution, for both Cygnus and HP, is to fix the data > structure so it actually works. I would recommend: > [ ... ] > - growing the hash table when the average chain length grows beyond > a certain limit, so the time overhead remains the same as the > problem size grows FWIW, libiberty now includes some generic hash table capabilities, including automatic expansion of the hash table once it hits a certain capacity. jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~1999-12-16 0:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.10.9911041529510.15357-100000@hpcll168.cup.hp.com>
1999-11-04 17:31 ` (patch) hpjyg09: bcache optimizations Andrew Cagney
1999-11-04 13:48 Jimmy Guo
1999-11-05 9:59 ` Jim Blandy
1999-11-05 10:50 ` Srikanth Adayapalam
1999-11-05 13:29 ` Jim Blandy
1999-12-15 1:16 ` Jeffrey A Law
1999-12-16 0:26 ` Jeffrey A Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox