[patch/rfc] Tweak bcache performance

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [patch/rfc] Tweak bcache performance
@ 2003-11-03 19:07 Andrew Cagney
  2003-11-03 19:12 ` Daniel Jacobowitz
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cagney @ 2003-11-03 19:07 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]

Hello,

This patch:

- adds a comparison of bcache and hashtab.[hc]
Summary: bcache has significantly less memory overhead

- tweaks the bcache algorithm to eliminate the one performance -ve
Summary: all memcmp's eliminated

- exploits an assumption that nothing is bigger than 64k
Summary: I figure stuff bigger than 64k shouldn't be in memory :-)

As for the tweak making a measurable difference, for "file gdb" it is 
slight (approx 0.780s vs 0.773s) but still there.

Daniel, does this answer your hashtab VS bcache question? :-)

Comments?  Baring them, I'll commit in a few days.

Andrew

PS: The stats.

   Cached 'partial symbol cache' statistics:
     Total object count:  36924
     Unique object count: 15935
     Percentage of duplicates, by count:  56%

     Total object size:   886176
     Unique object size:  382440
     Percentage of duplicates, by size:   56%

     Max entry size:     24
     Average entry size: 24
     Median entry size:  24

     Total memory used by bcache, including overhead: 526316
     Percentage memory overhead:  37%
     Net memory savings:          40%

     Hash table size:           4099
     Hash table expands:        3
     Hash table hashes:         52294
     Half hash misses:          0
     Hash table population:      97%
     Median hash chain length:    4
     Average hash chain length:   3
     Maximum hash chain length:  12

Notice how the half hash miss is zero.  This indicates that, on real 
data, all unnecessary memcmp's were eliminated!

Also notice how everything is 24 bytes in size (psymbol) because GDB 
isn't written in C++ :-(

.

[-- Attachment #2: diffs --]
[-- Type: text/plain, Size: 5607 bytes --]


	* bcache.c: Include "gdb_assert.h".
	(struct bcache): Add fields "expand_count" and
	"expand_hash_count".
	(expand_hash_table): Update the expand counts.
	(print_bcache_statistics): Use XCALLOC, not alloca.  Print stats
	on object sizes and hashes.
	* Makefile.in (bcache.o): Update dependencies.

Index: Makefile.in
===================================================================
RCS file: /cvs/src/src/gdb/Makefile.in,v
retrieving revision 1.467
diff -u -r1.467 Makefile.in
--- Makefile.in	1 Nov 2003 01:42:48 -0000	1.467
+++ Makefile.in	3 Nov 2003 17:01:26 -0000
@@ -1606,7 +1606,8 @@
 	$(target_h) $(ax_h) $(ax_gdb_h) $(gdb_string_h) $(block_h) \
 	$(regcache_h)
 ax-general.o: ax-general.c $(defs_h) $(ax_h) $(value_h) $(gdb_string_h)
-bcache.o: bcache.c $(defs_h) $(gdb_obstack_h) $(bcache_h) $(gdb_string_h)
+bcache.o: bcache.c $(defs_h) $(gdb_obstack_h) $(bcache_h) \
+	$(gdb_string_h) $(gdb_assert_h)
 bfd-target.o: bfd-target.c $(defs_h) $(target_h) $(bfd_target_h) \
 	$(gdb_assert_h) $(gdb_string_h)
 block.o: block.c $(defs_h) $(block_h) $(symtab_h) $(symfile_h) \
Index: bcache.c
===================================================================
RCS file: /cvs/src/src/gdb/bcache.c,v
retrieving revision 1.10
diff -u -r1.10 bcache.c
--- bcache.c	29 Jul 2002 22:55:26 -0000	1.10
+++ bcache.c	3 Nov 2003 17:01:26 -0000
@@ -2,7 +2,7 @@
    Written by Fred Fish <fnf@cygnus.com>
    Rewritten by Jim Blandy <jimb@cygnus.com>
 
-   Copyright 1999, 2000, 2002 Free Software Foundation, Inc.
+   Copyright 1999, 2000, 2002, 2003 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -25,6 +25,7 @@
 #include "gdb_obstack.h"
 #include "bcache.h"
 #include "gdb_string.h"		/* For memcpy declaration */
+#include "gdb_assert.h"
 
 #include <stddef.h>
 #include <stdlib.h>
@@ -71,6 +72,13 @@
   long unique_size;	/* size of unique strings, in bytes */
   long total_size;      /* total number of bytes cached, including dups */
   long structure_size;	/* total size of bcache, including infrastructure */
+  /* Number of times that the hash table is expanded and hence
+     re-built, and the corresponding number of times that a string is
+     [re]hashed as part of entering it into the expanded table.  The
+     total number of hashes can be computed by adding TOTAL_COUNT to
+     expand_hash_count.  */
+  unsigned long expand_count;
+  unsigned long expand_hash_count;
 };
 
 /* The old hash function was stolen from SDBM. This is what DB 3.0 uses now,
@@ -117,6 +125,11 @@
   struct bstring **new_buckets;
   unsigned int i;
 
+  /* Count the stats.  Every unique item needs to be re-hashed and
+     re-entered.  */
+  bcache->expand_count++;
+  bcache->expand_hash_count += bcache->unique_count;
+
   /* Find the next size.  */
   new_num_buckets = bcache->num_buckets * 2;
   for (i = 0; i < (sizeof (sizes) / sizeof (sizes[0])); i++)
@@ -265,12 +278,16 @@
   int occupied_buckets;
   int max_chain_length;
   int median_chain_length;
+  int max_entry_size;
+  int median_entry_size;
 
-  /* Count the number of occupied buckets, and measure chain lengths.  */
+  /* Count the number of occupied buckets, tally the various string
+     lengths, and measure chain lengths.  */
   {
     unsigned int b;
-    int *chain_length
-      = (int *) alloca (c->num_buckets * sizeof (*chain_length));
+    int *chain_length = XCALLOC (c->num_buckets + 1, int);
+    int *entry_size = XCALLOC (c->unique_count + 1, int);
+    int stringi = 0;
 
     occupied_buckets = 0;
 
@@ -286,7 +303,10 @@
 	    
 	    while (s)
 	      {
+		gdb_assert (b < c->num_buckets);
 		chain_length[b]++;
+		gdb_assert (stringi < c->unique_count);
+		entry_size[stringi++] = s->length;
 		s = s->next;
 	      }
 	  }
@@ -295,6 +315,8 @@
     /* To compute the median, we need the set of chain lengths sorted.  */
     qsort (chain_length, c->num_buckets, sizeof (chain_length[0]),
 	   compare_ints);
+    qsort (entry_size, c->unique_count, sizeof (entry_size[0]),
+	   compare_ints);
 
     if (c->num_buckets > 0)
       {
@@ -306,6 +328,19 @@
 	max_chain_length = 0;
 	median_chain_length = 0;
       }
+    if (c->unique_count > 0)
+      {
+	max_entry_size = entry_size[c->unique_count - 1];
+	median_entry_size = entry_size[c->unique_count / 2];
+      }
+    else
+      {
+	max_entry_size = 0;
+	median_entry_size = 0;
+      }
+
+    xfree (chain_length);
+    xfree (entry_size);
   }
 
   printf_filtered ("  Cached '%s' statistics:\n", type);
@@ -321,6 +356,15 @@
   print_percentage (c->total_size - c->unique_size, c->total_size);
   printf_filtered ("\n");
 
+  printf_filtered ("    Max entry size:     %d\n", max_entry_size);
+  printf_filtered ("    Average entry size: ");
+  if (c->unique_count > 0)
+    printf_filtered ("%ld\n", c->unique_size / c->unique_count);
+  else
+    printf_filtered ("(not applicable)\n");    
+  printf_filtered ("    Median entry size:  %d\n", median_entry_size);
+  printf_filtered ("\n");
+
   printf_filtered ("    Total memory used by bcache, including overhead: %ld\n",
 		   c->structure_size);
   printf_filtered ("    Percentage memory overhead: ");
@@ -330,6 +374,10 @@
   printf_filtered ("\n");
 
   printf_filtered ("    Hash table size:           %3d\n", c->num_buckets);
+  printf_filtered ("    Hash table expands:        %lu\n",
+		   c->expand_count);
+  printf_filtered ("    Hash table hashes:         %lu\n",
+		   c->total_count + c->expand_hash_count);
   printf_filtered ("    Hash table population:     ");
   print_percentage (occupied_buckets, c->num_buckets);
   printf_filtered ("    Median hash chain length:  %3d\n",

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch/rfc] Tweak bcache performance
  2003-11-03 19:07 [patch/rfc] Tweak bcache performance Andrew Cagney
@ 2003-11-03 19:12 ` Daniel Jacobowitz
  2003-11-03 19:54   ` Andrew Cagney
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Jacobowitz @ 2003-11-03 19:12 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb-patches

On Mon, Nov 03, 2003 at 02:07:04PM -0500, Andrew Cagney wrote:
> Hello,
> 
> This patch:
> 
> - adds a comparison of bcache and hashtab.[hc]
> Summary: bcache has significantly less memory overhead
> 
> - tweaks the bcache algorithm to eliminate the one performance -ve
> Summary: all memcmp's eliminated
> 
> - exploits an assumption that nothing is bigger than 64k
> Summary: I figure stuff bigger than 64k shouldn't be in memory :-)
> 
> As for the tweak making a measurable difference, for "file gdb" it is 
> slight (approx 0.780s vs 0.773s) but still there.
> 
> Daniel, does this answer your hashtab VS bcache question? :-)
> 
> Comments?  Baring them, I'll commit in a few days.

Wrong patch attached?  It doesn't answer my question, but the right
patch might :)

I would like to know why bcache has significantly less memory overhead. 
I haven't looked at this code in a while; I'll find some time to dig in
and remind myself.

> Andrew
> 
> PS: The stats.
> 
>   Cached 'partial symbol cache' statistics:
>     Total object count:  36924
>     Unique object count: 15935
>     Percentage of duplicates, by count:  56%
> 
>     Total object size:   886176
>     Unique object size:  382440
>     Percentage of duplicates, by size:   56%
> 
>     Max entry size:     24
>     Average entry size: 24
>     Median entry size:  24
> 
>     Total memory used by bcache, including overhead: 526316
>     Percentage memory overhead:  37%
>     Net memory savings:          40%
> 
>     Hash table size:           4099
>     Hash table expands:        3
>     Hash table hashes:         52294
>     Half hash misses:          0
>     Hash table population:      97%
>     Median hash chain length:    4
>     Average hash chain length:   3
>     Maximum hash chain length:  12
> 
> Notice how the half hash miss is zero.  This indicates that, on real 
> data, all unnecessary memcmp's were eliminated!
> 
> Also notice how everything is 24 bytes in size (psymbol) because GDB 
> isn't written in C++ :-(
> 
> .

> 
> 	* bcache.c: Include "gdb_assert.h".
> 	(struct bcache): Add fields "expand_count" and
> 	"expand_hash_count".
> 	(expand_hash_table): Update the expand counts.
> 	(print_bcache_statistics): Use XCALLOC, not alloca.  Print stats
> 	on object sizes and hashes.
> 	* Makefile.in (bcache.o): Update dependencies.
> 
> Index: Makefile.in
> ===================================================================
> RCS file: /cvs/src/src/gdb/Makefile.in,v
> retrieving revision 1.467
> diff -u -r1.467 Makefile.in
> --- Makefile.in	1 Nov 2003 01:42:48 -0000	1.467
> +++ Makefile.in	3 Nov 2003 17:01:26 -0000
> @@ -1606,7 +1606,8 @@
>  	$(target_h) $(ax_h) $(ax_gdb_h) $(gdb_string_h) $(block_h) \
>  	$(regcache_h)
>  ax-general.o: ax-general.c $(defs_h) $(ax_h) $(value_h) $(gdb_string_h)
> -bcache.o: bcache.c $(defs_h) $(gdb_obstack_h) $(bcache_h) $(gdb_string_h)
> +bcache.o: bcache.c $(defs_h) $(gdb_obstack_h) $(bcache_h) \
> +	$(gdb_string_h) $(gdb_assert_h)
>  bfd-target.o: bfd-target.c $(defs_h) $(target_h) $(bfd_target_h) \
>  	$(gdb_assert_h) $(gdb_string_h)
>  block.o: block.c $(defs_h) $(block_h) $(symtab_h) $(symfile_h) \
> Index: bcache.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/bcache.c,v
> retrieving revision 1.10
> diff -u -r1.10 bcache.c
> --- bcache.c	29 Jul 2002 22:55:26 -0000	1.10
> +++ bcache.c	3 Nov 2003 17:01:26 -0000
> @@ -2,7 +2,7 @@
>     Written by Fred Fish <fnf@cygnus.com>
>     Rewritten by Jim Blandy <jimb@cygnus.com>
>  
> -   Copyright 1999, 2000, 2002 Free Software Foundation, Inc.
> +   Copyright 1999, 2000, 2002, 2003 Free Software Foundation, Inc.
>  
>     This file is part of GDB.
>  
> @@ -25,6 +25,7 @@
>  #include "gdb_obstack.h"
>  #include "bcache.h"
>  #include "gdb_string.h"		/* For memcpy declaration */
> +#include "gdb_assert.h"
>  
>  #include <stddef.h>
>  #include <stdlib.h>
> @@ -71,6 +72,13 @@
>    long unique_size;	/* size of unique strings, in bytes */
>    long total_size;      /* total number of bytes cached, including dups */
>    long structure_size;	/* total size of bcache, including infrastructure */
> +  /* Number of times that the hash table is expanded and hence
> +     re-built, and the corresponding number of times that a string is
> +     [re]hashed as part of entering it into the expanded table.  The
> +     total number of hashes can be computed by adding TOTAL_COUNT to
> +     expand_hash_count.  */
> +  unsigned long expand_count;
> +  unsigned long expand_hash_count;
>  };
>  
>  /* The old hash function was stolen from SDBM. This is what DB 3.0 uses now,
> @@ -117,6 +125,11 @@
>    struct bstring **new_buckets;
>    unsigned int i;
>  
> +  /* Count the stats.  Every unique item needs to be re-hashed and
> +     re-entered.  */
> +  bcache->expand_count++;
> +  bcache->expand_hash_count += bcache->unique_count;
> +
>    /* Find the next size.  */
>    new_num_buckets = bcache->num_buckets * 2;
>    for (i = 0; i < (sizeof (sizes) / sizeof (sizes[0])); i++)
> @@ -265,12 +278,16 @@
>    int occupied_buckets;
>    int max_chain_length;
>    int median_chain_length;
> +  int max_entry_size;
> +  int median_entry_size;
>  
> -  /* Count the number of occupied buckets, and measure chain lengths.  */
> +  /* Count the number of occupied buckets, tally the various string
> +     lengths, and measure chain lengths.  */
>    {
>      unsigned int b;
> -    int *chain_length
> -      = (int *) alloca (c->num_buckets * sizeof (*chain_length));
> +    int *chain_length = XCALLOC (c->num_buckets + 1, int);
> +    int *entry_size = XCALLOC (c->unique_count + 1, int);
> +    int stringi = 0;
>  
>      occupied_buckets = 0;
>  
> @@ -286,7 +303,10 @@
>  	    
>  	    while (s)
>  	      {
> +		gdb_assert (b < c->num_buckets);
>  		chain_length[b]++;
> +		gdb_assert (stringi < c->unique_count);
> +		entry_size[stringi++] = s->length;
>  		s = s->next;
>  	      }
>  	  }
> @@ -295,6 +315,8 @@
>      /* To compute the median, we need the set of chain lengths sorted.  */
>      qsort (chain_length, c->num_buckets, sizeof (chain_length[0]),
>  	   compare_ints);
> +    qsort (entry_size, c->unique_count, sizeof (entry_size[0]),
> +	   compare_ints);
>  
>      if (c->num_buckets > 0)
>        {
> @@ -306,6 +328,19 @@
>  	max_chain_length = 0;
>  	median_chain_length = 0;
>        }
> +    if (c->unique_count > 0)
> +      {
> +	max_entry_size = entry_size[c->unique_count - 1];
> +	median_entry_size = entry_size[c->unique_count / 2];
> +      }
> +    else
> +      {
> +	max_entry_size = 0;
> +	median_entry_size = 0;
> +      }
> +
> +    xfree (chain_length);
> +    xfree (entry_size);
>    }
>  
>    printf_filtered ("  Cached '%s' statistics:\n", type);
> @@ -321,6 +356,15 @@
>    print_percentage (c->total_size - c->unique_size, c->total_size);
>    printf_filtered ("\n");
>  
> +  printf_filtered ("    Max entry size:     %d\n", max_entry_size);
> +  printf_filtered ("    Average entry size: ");
> +  if (c->unique_count > 0)
> +    printf_filtered ("%ld\n", c->unique_size / c->unique_count);
> +  else
> +    printf_filtered ("(not applicable)\n");    
> +  printf_filtered ("    Median entry size:  %d\n", median_entry_size);
> +  printf_filtered ("\n");
> +
>    printf_filtered ("    Total memory used by bcache, including overhead: %ld\n",
>  		   c->structure_size);
>    printf_filtered ("    Percentage memory overhead: ");
> @@ -330,6 +374,10 @@
>    printf_filtered ("\n");
>  
>    printf_filtered ("    Hash table size:           %3d\n", c->num_buckets);
> +  printf_filtered ("    Hash table expands:        %lu\n",
> +		   c->expand_count);
> +  printf_filtered ("    Hash table hashes:         %lu\n",
> +		   c->total_count + c->expand_hash_count);
>    printf_filtered ("    Hash table population:     ");
>    print_percentage (occupied_buckets, c->num_buckets);
>    printf_filtered ("    Median hash chain length:  %3d\n",


-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch/rfc] Tweak bcache performance
  2003-11-03 19:12 ` Daniel Jacobowitz
@ 2003-11-03 19:54   ` Andrew Cagney
  2003-11-07 22:08     ` Andrew Cagney
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cagney @ 2003-11-03 19:54 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 334 bytes --]

> Comments?  Baring them, I'll commit in a few days.
> 
> 
> Wrong patch attached?  It doesn't answer my question, but the right
> patch might :)

Ugger.

> I would like to know why bcache has significantly less memory overhead. 
> I haven't looked at this code in a while; I'll find some time to dig in
> and remind myself.

Andrew


[-- Attachment #2: diffs --]
[-- Type: text/plain, Size: 8318 bytes --]

2003-11-03  Andrew Cagney  <cagney@redhat.com>

	* bcache.h: Update copyright.  Add comments on bcache VS hashtab.
	* bcache.c (struct bstring): Make "length" an unsigned short, add
	"half_hash".
	(struct bcache): Add "half_hash_error_count".
	(bcache): Compute and save the "half_hash".  Compare the
	"half_hash" before comparing the length.  Update
	half_hash_error_count.

Index: bcache.c
===================================================================
RCS file: /cvs/src/src/gdb/bcache.c,v
retrieving revision 1.11
diff -u -r1.11 bcache.c
--- bcache.c	3 Nov 2003 17:03:11 -0000	1.11
+++ bcache.c	3 Nov 2003 19:51:14 -0000
@@ -38,8 +38,15 @@
 
 struct bstring
 {
+  /* Hash chain.  */
   struct bstring *next;
-  size_t length;
+  /* Assume the data length is no more than 64k.  */
+  unsigned short length;
+  /* The half hash hack.  This contains the upper 16 bits of the hash
+     value and is used as a pre-check when comparing two strings and
+     avoids the need to do length or memcmp calls.  It proves to be
+     roughly 100% effective.  */
+  unsigned short half_hash;
 
   union
   {
@@ -79,6 +86,10 @@
      expand_hash_count.  */
   unsigned long expand_count;
   unsigned long expand_hash_count;
+  /* Number of times that the half-hash compare hit (compare the upper
+     16 bits of hash values) hit, but the corresponding combined
+     length/data compare missed.  */
+  unsigned long half_hash_miss_count;
 };
 
 /* The old hash function was stolen from SDBM. This is what DB 3.0 uses now,
@@ -187,6 +198,8 @@
 void *
 bcache (const void *addr, int length, struct bcache *bcache)
 {
+  unsigned long full_hash;
+  unsigned short half_hash;
   int hash_index;
   struct bstring *s;
 
@@ -197,13 +210,24 @@
   bcache->total_count++;
   bcache->total_size += length;
 
-  hash_index = hash (addr, length) % bcache->num_buckets;
-
-  /* Search the hash bucket for a string identical to the caller's.  */
+  full_hash = hash (addr, length);
+  half_hash = (full_hash >> 16);
+  hash_index = full_hash % bcache->num_buckets;
+
+  /* Search the hash bucket for a string identical to the caller's.
+     As a short-circuit first compare the upper part of each hash
+     values.  */
   for (s = bcache->bucket[hash_index]; s; s = s->next)
-    if (s->length == length
-	&& ! memcmp (&s->d.data, addr, length))
-      return &s->d.data;
+    {
+      if (s->half_hash == half_hash)
+	{
+	  if (s->length == length
+	      && ! memcmp (&s->d.data, addr, length))
+	    return &s->d.data;
+	  else
+	    bcache->half_hash_miss_count++;
+	}
+    }
 
   /* The user's string isn't in the list.  Insert it after *ps.  */
   {
@@ -212,6 +236,7 @@
     memcpy (&new->d.data, addr, length);
     new->length = length;
     new->next = bcache->bucket[hash_index];
+    new->half_hash = half_hash;
     bcache->bucket[hash_index] = new;
 
     bcache->unique_count++;
@@ -378,6 +403,8 @@
 		   c->expand_count);
   printf_filtered ("    Hash table hashes:         %lu\n",
 		   c->total_count + c->expand_hash_count);
+  printf_filtered ("    Half hash misses:          %lu\n",
+		   c->half_hash_miss_count);
   printf_filtered ("    Hash table population:     ");
   print_percentage (occupied_buckets, c->num_buckets);
   printf_filtered ("    Median hash chain length:  %3d\n",
Index: bcache.h
===================================================================
RCS file: /cvs/src/src/gdb/bcache.h,v
retrieving revision 1.6
diff -u -r1.6 bcache.h
--- bcache.h	12 Jul 2002 15:23:10 -0000	1.6
+++ bcache.h	3 Nov 2003 19:51:14 -0000
@@ -2,7 +2,7 @@
    Written by Fred Fish <fnf@cygnus.com>
    Rewritten by Jim Blandy <jimb@cygnus.com>
 
-   Copyright 1999, 2000, 2002 Free Software Foundation, Inc.
+   Copyright 1999, 2000, 2002, 2003 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -48,20 +48,95 @@
    You shouldn't modify the strings you get from a bcache, because:
 
    - You don't necessarily know who you're sharing space with.  If I
-     stick eight bytes of text in a bcache, and then stick an
-     eight-byte structure in the same bcache, there's no guarantee
-     those two objects don't actually comprise the same sequence of
-     bytes.  If they happen to, the bcache will use a single byte
-     string for both of them.  Then, modifying the structure will
-     change the string.  In bizarre ways.
+   stick eight bytes of text in a bcache, and then stick an eight-byte
+   structure in the same bcache, there's no guarantee those two
+   objects don't actually comprise the same sequence of bytes.  If
+   they happen to, the bcache will use a single byte string for both
+   of them.  Then, modifying the structure will change the string.  In
+   bizarre ways.
 
    - Even if you know for some other reason that all that's okay,
-     there's another problem.  A bcache stores all its strings in a
-     hash table.  If you modify a string's contents, you will probably
-     change its hash value.  This means that the modified string is
-     now in the wrong place in the hash table, and future bcache
-     probes will never find it.  So by mutating a string, you give up
-     any chance of sharing its space with future duplicates.  */
+   there's another problem.  A bcache stores all its strings in a hash
+   table.  If you modify a string's contents, you will probably change
+   its hash value.  This means that the modified string is now in the
+   wrong place in the hash table, and future bcache probes will never
+   find it.  So by mutating a string, you give up any chance of
+   sharing its space with future duplicates.
+
+
+   Size of bcache VS hashtab:
+
+   For bcache, the most critical cost is size (or more exactly the
+   overhead added by the bcache).  It turns out that the bcache is
+   remarkably efficient.
+
+   Assuming a 32-bit system (the hash table slots are 4 bytes),
+   ignoring alignment, and limit strings to 255 bytes (1 byte length)
+   we get ...
+
+   bcache: This uses a separate linked list to track the hash chain.
+   The numbers show roughly 100% occupancy of the hash table and an
+   average chain length of 4.  Spreading the slot cost over the 4
+   chain elements:
+
+   4 (slot) / 4 (chain length) + 1 (length) + 4 (chain) = 6 bytes
+
+   hashtab: This uses a more traditional re-hash algorithm where the
+   chain is maintained within the hash table.  The table occupancy is
+   kept below 75% but we'll assume its perfect:
+
+   4 (slot) x 4/3 (occupancy) +  1 (length) = 6 1/3 bytes
+
+   So a perfect hashtab has just slightly larger than an average
+   bcache.
+
+   It turns out that an average hashtab is far worse.  Two things
+   hurt:
+
+   - Hashtab's occupancy is more like 50% (it ranges between 38% and
+   75%) giving a per slot cost of 4x2 vs 4x4/3.
+
+   - the string structure needs to be aligned to 8 bytes which for
+   hashtab wastes 7 bytes, while for bcache wastes only 3.
+
+   This gives:
+
+   hashtab: 4 x 2 + 1 + 7 = 16 bytes
+
+   bcache 4 / 4 + 1 + 4 + 3 = 9 bytes
+
+   The numbers of GDB debugging GDB support this.  ~40% vs ~70% overhead.
+
+
+   Speed of bcache VS hashtab (the half hash hack):
+
+   While hashtab has a typical chain length of 1, bcache has a chain
+   length of round 4.  This means that the bcache will require
+   something like double the number of compares after that initial
+   hash.  In both cases the comparison takes the form:
+
+   a.length == b.length && memcmp (a.data, b.data, a.length) == 0
+
+   That is lengths are checked before doing the memcmp.
+
+   For GDB debugging GDB, it turned out that all lengths were 24 bytes
+   (no C++ so only psymbols were cached) and hence, all compares
+   required a call to memcmp.  As a hack, two bytes of padding
+   (mentioned above) are used to store the upper 16 bits of the
+   string's hash value and then that is used in the comparison vis:
+
+   a.half_hash == b.half_hash && a.length == b.length && memcmp
+   (a.data, b.data, a.length)
+
+   The numbers from GDB debugging GDB show this to be a remarkable
+   100% effective (only necessary length and memcmp tests being
+   performed).
+
+   Mind you, looking at the wall clock, the same GDB debugging GDB
+   showed only marginal speed up (0.780 vs 0.773s).  Seems GDB is too
+   busy doing something else :-(
+  
+*/
 
 
 struct bcache;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch/rfc] Tweak bcache performance
  2003-11-03 19:54   ` Andrew Cagney
@ 2003-11-07 22:08     ` Andrew Cagney
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Cagney @ 2003-11-07 22:08 UTC (permalink / raw)
  To: gdb-patches; +Cc: Daniel Jacobowitz

> 2003-11-03  Andrew Cagney  <cagney@redhat.com>
> 
> 	* bcache.h: Update copyright.  Add comments on bcache VS hashtab.
> 	* bcache.c (struct bstring): Make "length" an unsigned short, add
> 	"half_hash".
> 	(struct bcache): Add "half_hash_error_count".
> 	(bcache): Compute and save the "half_hash".  Compare the
> 	"half_hash" before comparing the length.  Update
> 	half_hash_error_count.

I've checked this in.

Andrew



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-11-07 22:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-03 19:07 [patch/rfc] Tweak bcache performance Andrew Cagney
2003-11-03 19:12 ` Daniel Jacobowitz
2003-11-03 19:54   ` Andrew Cagney
2003-11-07 22:08     ` Andrew Cagney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox