From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-76553-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 9014 invoked by alias); 22 Nov 2010 19:44:11 -0000
Received: (qmail 8953 invoked by uid 22791); 22 Nov 2010 19:44:09 -0000
X-SWARE-Spam-Status: No, hits=-5.4 required=5.0	tests=AWL,BAYES_00,KAM_STOCKGEN,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TW_BJ,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 22 Nov 2010 19:44:02 +0000
Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id oAMJhi7Y025938	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Mon, 22 Nov 2010 14:43:45 -0500
Received: from host0.dyn.jankratochvil.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1])	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id oAMJhbL5011692	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);	Mon, 22 Nov 2010 14:43:43 -0500
Received: from host0.dyn.jankratochvil.net (localhost.localdomain [127.0.0.1])	by host0.dyn.jankratochvil.net (8.14.4/8.14.4) with ESMTP id oAMJhaH7022115;	Mon, 22 Nov 2010 20:43:36 +0100
Received: (from jkratoch@localhost)	by host0.dyn.jankratochvil.net (8.14.4/8.14.4/Submit) id oAMJhanj022114;	Mon, 22 Nov 2010 20:43:36 +0100
Date: Mon, 22 Nov 2010 19:44:00 -0000
From: Jan Kratochvil <jan.kratochvil@redhat.com>
To: Joel Brobecker <brobecker@adacore.com>
Cc: Tom Tromey <tromey@redhat.com>, gdb-patches@sourceware.org
Subject: Re: [patch 2/2] iFort compat.: case insensitive symbols (PR 11313)
Message-ID: <20101122194336.GA21855@host0.dyn.jankratochvil.net>
References: <m3wroi4br5.fsf@fleche.redhat.com> <20101108183133.GE2933@adacore.com> <20101122035334.GA9229@host0.dyn.jankratochvil.net> <20101122185432.GT2634@adacore.com> <20101122191905.GA20976@host0.dyn.jankratochvil.net> <20101122193041.GU2634@adacore.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20101122193041.GU2634@adacore.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2010-11/txt/msg00310.txt.bz2

On Mon, 22 Nov 2010 20:30:41 +0100, Joel Brobecker wrote:
> I was actually wondering about the change in the hash algorithm more
> than the cost of calling tolower.  For instance, "tmp" and "Tmp" would
> have had different hash values, but not anymore.  So, presumably, when
> you start looking up for "tmp", the associated hash bucket will also
> contain "Tmp" whereas it wouldn't before. I need to look at the actual
> hashing parameters to see if we can figure out whether this should have
> any real effect in practice...  If the number of elements in each bucket
> is reasonable, a few more iterations shouldn't be an issue.

This is a more general issue.

I think (I did not measure it) most of the symbols differ even after tolower.
The symbols like tmp<->Tmp exist but rarely.  I agree the hashing function
will get worse but I did not even measure it considering the change
negligible.

There is more an issue MINIMAL_SYMBOL_HASH_SIZE is constant:
	#define MINIMAL_SYMBOL_HASH_SIZE 2039

Some objfiles have many symbols:
	libwebkit.so.debug: 54980 symbols
		/MINIMAL_SYMBOL_HASH_SIZE = 27
		log2(54980)=16
	gdb symtab: 36452 symbols
		/MINIMAL_SYMBOL_HASH_SIZE = 18
		log2(54980)=16

In such case in fact the whole hash table makes no sense and it is even
cheaper to just do binary search on objfile->msymbols which is already
qsort-ed and be done with it.

Still a hash table should be faster than a binary search but the hash table
size would need to be adaptable.

But rather than optimizations of this which reduce just the CPU load which was
in my measurements 2% during GDB startup (due to its waiting on disk).  We
could for example rather delay searching+loading any objfiles' symbols we do
not need which would do another major GDB startup time reduction like
.gdb_index did.  This is the reason I did not intend to spend any time on some
CPU discutable optimizations, they IMO do not make sense with the current
state of gdb performance.


Thanks,
Jan