From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-28376-listarch-gdb-patches=sources.redhat.com@sources.redhat.com>
Received: (qmail 25818 invoked by alias); 3 Nov 2003 19:12:24 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 25811 invoked from network); 3 Nov 2003 19:12:22 -0000
Received: from unknown (HELO nevyn.them.org) (66.93.172.17)
  by sources.redhat.com with SMTP; 3 Nov 2003 19:12:22 -0000
Received: from drow by nevyn.them.org with local (Exim 4.24 #1 (Debian))
	id 1AGk7i-0000Ts-3W; Mon, 03 Nov 2003 14:12:22 -0500
Date: Mon, 03 Nov 2003 19:12:00 -0000
From: Daniel Jacobowitz <drow@mvista.com>
To: Andrew Cagney <ac131313@redhat.com>
Cc: gdb-patches@sources.redhat.com
Subject: Re: [patch/rfc] Tweak bcache performance
Message-ID: <20031103191222.GA1761@nevyn.them.org>
Mail-Followup-To: Andrew Cagney <ac131313@redhat.com>,
	gdb-patches@sources.redhat.com
References: <3FA6A758.1060104@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3FA6A758.1060104@redhat.com>
User-Agent: Mutt/1.5.1i
X-SW-Source: 2003-11/txt/msg00036.txt.bz2

On Mon, Nov 03, 2003 at 02:07:04PM -0500, Andrew Cagney wrote:
> Hello,
> 
> This patch:
> 
> - adds a comparison of bcache and hashtab.[hc]
> Summary: bcache has significantly less memory overhead
> 
> - tweaks the bcache algorithm to eliminate the one performance -ve
> Summary: all memcmp's eliminated
> 
> - exploits an assumption that nothing is bigger than 64k
> Summary: I figure stuff bigger than 64k shouldn't be in memory :-)
> 
> As for the tweak making a measurable difference, for "file gdb" it is 
> slight (approx 0.780s vs 0.773s) but still there.
> 
> Daniel, does this answer your hashtab VS bcache question? :-)
> 
> Comments?  Baring them, I'll commit in a few days.

Wrong patch attached?  It doesn't answer my question, but the right
patch might :)

I would like to know why bcache has significantly less memory overhead. 
I haven't looked at this code in a while; I'll find some time to dig in
and remind myself.

> Andrew
> 
> PS: The stats.
> 
>   Cached 'partial symbol cache' statistics:
>     Total object count:  36924
>     Unique object count: 15935
>     Percentage of duplicates, by count:  56%
> 
>     Total object size:   886176
>     Unique object size:  382440
>     Percentage of duplicates, by size:   56%
> 
>     Max entry size:     24
>     Average entry size: 24
>     Median entry size:  24
> 
>     Total memory used by bcache, including overhead: 526316
>     Percentage memory overhead:  37%
>     Net memory savings:          40%
> 
>     Hash table size:           4099
>     Hash table expands:        3
>     Hash table hashes:         52294
>     Half hash misses:          0
>     Hash table population:      97%
>     Median hash chain length:    4
>     Average hash chain length:   3
>     Maximum hash chain length:  12
> 
> Notice how the half hash miss is zero.  This indicates that, on real 
> data, all unnecessary memcmp's were eliminated!
> 
> Also notice how everything is 24 bytes in size (psymbol) because GDB 
> isn't written in C++ :-(
> 
> .

> 
> 	* bcache.c: Include "gdb_assert.h".
> 	(struct bcache): Add fields "expand_count" and
> 	"expand_hash_count".
> 	(expand_hash_table): Update the expand counts.
> 	(print_bcache_statistics): Use XCALLOC, not alloca.  Print stats
> 	on object sizes and hashes.
> 	* Makefile.in (bcache.o): Update dependencies.
> 
> Index: Makefile.in
> ===================================================================
> RCS file: /cvs/src/src/gdb/Makefile.in,v
> retrieving revision 1.467
> diff -u -r1.467 Makefile.in
> --- Makefile.in	1 Nov 2003 01:42:48 -0000	1.467
> +++ Makefile.in	3 Nov 2003 17:01:26 -0000
> @@ -1606,7 +1606,8 @@
>  	$(target_h) $(ax_h) $(ax_gdb_h) $(gdb_string_h) $(block_h) \
>  	$(regcache_h)
>  ax-general.o: ax-general.c $(defs_h) $(ax_h) $(value_h) $(gdb_string_h)
> -bcache.o: bcache.c $(defs_h) $(gdb_obstack_h) $(bcache_h) $(gdb_string_h)
> +bcache.o: bcache.c $(defs_h) $(gdb_obstack_h) $(bcache_h) \
> +	$(gdb_string_h) $(gdb_assert_h)
>  bfd-target.o: bfd-target.c $(defs_h) $(target_h) $(bfd_target_h) \
>  	$(gdb_assert_h) $(gdb_string_h)
>  block.o: block.c $(defs_h) $(block_h) $(symtab_h) $(symfile_h) \
> Index: bcache.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/bcache.c,v
> retrieving revision 1.10
> diff -u -r1.10 bcache.c
> --- bcache.c	29 Jul 2002 22:55:26 -0000	1.10
> +++ bcache.c	3 Nov 2003 17:01:26 -0000
> @@ -2,7 +2,7 @@
>     Written by Fred Fish <fnf@cygnus.com>
>     Rewritten by Jim Blandy <jimb@cygnus.com>
>  
> -   Copyright 1999, 2000, 2002 Free Software Foundation, Inc.
> +   Copyright 1999, 2000, 2002, 2003 Free Software Foundation, Inc.
>  
>     This file is part of GDB.
>  
> @@ -25,6 +25,7 @@
>  #include "gdb_obstack.h"
>  #include "bcache.h"
>  #include "gdb_string.h"		/* For memcpy declaration */
> +#include "gdb_assert.h"
>  
>  #include <stddef.h>
>  #include <stdlib.h>
> @@ -71,6 +72,13 @@
>    long unique_size;	/* size of unique strings, in bytes */
>    long total_size;      /* total number of bytes cached, including dups */
>    long structure_size;	/* total size of bcache, including infrastructure */
> +  /* Number of times that the hash table is expanded and hence
> +     re-built, and the corresponding number of times that a string is
> +     [re]hashed as part of entering it into the expanded table.  The
> +     total number of hashes can be computed by adding TOTAL_COUNT to
> +     expand_hash_count.  */
> +  unsigned long expand_count;
> +  unsigned long expand_hash_count;
>  };
>  
>  /* The old hash function was stolen from SDBM. This is what DB 3.0 uses now,
> @@ -117,6 +125,11 @@
>    struct bstring **new_buckets;
>    unsigned int i;
>  
> +  /* Count the stats.  Every unique item needs to be re-hashed and
> +     re-entered.  */
> +  bcache->expand_count++;
> +  bcache->expand_hash_count += bcache->unique_count;
> +
>    /* Find the next size.  */
>    new_num_buckets = bcache->num_buckets * 2;
>    for (i = 0; i < (sizeof (sizes) / sizeof (sizes[0])); i++)
> @@ -265,12 +278,16 @@
>    int occupied_buckets;
>    int max_chain_length;
>    int median_chain_length;
> +  int max_entry_size;
> +  int median_entry_size;
>  
> -  /* Count the number of occupied buckets, and measure chain lengths.  */
> +  /* Count the number of occupied buckets, tally the various string
> +     lengths, and measure chain lengths.  */
>    {
>      unsigned int b;
> -    int *chain_length
> -      = (int *) alloca (c->num_buckets * sizeof (*chain_length));
> +    int *chain_length = XCALLOC (c->num_buckets + 1, int);
> +    int *entry_size = XCALLOC (c->unique_count + 1, int);
> +    int stringi = 0;
>  
>      occupied_buckets = 0;
>  
> @@ -286,7 +303,10 @@
>  	    
>  	    while (s)
>  	      {
> +		gdb_assert (b < c->num_buckets);
>  		chain_length[b]++;
> +		gdb_assert (stringi < c->unique_count);
> +		entry_size[stringi++] = s->length;
>  		s = s->next;
>  	      }
>  	  }
> @@ -295,6 +315,8 @@
>      /* To compute the median, we need the set of chain lengths sorted.  */
>      qsort (chain_length, c->num_buckets, sizeof (chain_length[0]),
>  	   compare_ints);
> +    qsort (entry_size, c->unique_count, sizeof (entry_size[0]),
> +	   compare_ints);
>  
>      if (c->num_buckets > 0)
>        {
> @@ -306,6 +328,19 @@
>  	max_chain_length = 0;
>  	median_chain_length = 0;
>        }
> +    if (c->unique_count > 0)
> +      {
> +	max_entry_size = entry_size[c->unique_count - 1];
> +	median_entry_size = entry_size[c->unique_count / 2];
> +      }
> +    else
> +      {
> +	max_entry_size = 0;
> +	median_entry_size = 0;
> +      }
> +
> +    xfree (chain_length);
> +    xfree (entry_size);
>    }
>  
>    printf_filtered ("  Cached '%s' statistics:\n", type);
> @@ -321,6 +356,15 @@
>    print_percentage (c->total_size - c->unique_size, c->total_size);
>    printf_filtered ("\n");
>  
> +  printf_filtered ("    Max entry size:     %d\n", max_entry_size);
> +  printf_filtered ("    Average entry size: ");
> +  if (c->unique_count > 0)
> +    printf_filtered ("%ld\n", c->unique_size / c->unique_count);
> +  else
> +    printf_filtered ("(not applicable)\n");    
> +  printf_filtered ("    Median entry size:  %d\n", median_entry_size);
> +  printf_filtered ("\n");
> +
>    printf_filtered ("    Total memory used by bcache, including overhead: %ld\n",
>  		   c->structure_size);
>    printf_filtered ("    Percentage memory overhead: ");
> @@ -330,6 +374,10 @@
>    printf_filtered ("\n");
>  
>    printf_filtered ("    Hash table size:           %3d\n", c->num_buckets);
> +  printf_filtered ("    Hash table expands:        %lu\n",
> +		   c->expand_count);
> +  printf_filtered ("    Hash table hashes:         %lu\n",
> +		   c->total_count + c->expand_hash_count);
>    printf_filtered ("    Hash table population:     ");
>    print_percentage (occupied_buckets, c->num_buckets);
>    printf_filtered ("    Median hash chain length:  %3d\n",


-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer