From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-10518-listarch-gdb=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 16744 invoked by alias); 24 Sep 2002 04:28:20 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 16737 invoked from network); 24 Sep 2002 04:28:20 -0000
Received: from unknown (HELO mail.cdt.org) (206.112.85.61)
  by sources.redhat.com with SMTP; 24 Sep 2002 04:28:20 -0000
Received: from dberlin.org (pool-138-88-48-174.res.east.verizon.net [138.88.48.174])
	by mail.cdt.org (Postfix) with ESMTP
	id 7145D49005C; Tue, 24 Sep 2002 00:05:19 -0400 (EDT)
Received: from [127.0.0.1] (HELO localhost)
  by dberlin.org (CommuniGate Pro SMTP 4.0b9a)
  with ESMTP-TLS id 250167; Tue, 24 Sep 2002 00:28:18 -0400
Date: Mon, 23 Sep 2002 21:28:00 -0000
From: Daniel Berlin <dberlin@dberlin.org>
To: Daniel Jacobowitz <drow@mvista.com>
Cc: David Carlton <carlton@math.stanford.edu>,
	Jim Blandy <jimb@redhat.com>, <gdb@sources.redhat.com>
Subject: Re: suggestion for dictionary representation
In-Reply-To: <20020924012618.GA11065@nevyn.them.org>
Message-ID: <Pine.LNX.4.44.0209240006070.4266-100000@dberlin.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-SW-Source: 2002-09/txt/msg00361.txt.bz2


On Mon, 23 Sep 2002, Daniel Jacobowitz wrote:

> On Mon, Sep 23, 2002 at 08:34:50PM -0400, Daniel Berlin wrote:
> > >I'm also curious about how it would affect the speed of reading in
> > >symbols.  Right now, that should be O(n), where n is the number of
> > >global symbols, right?
> > 
> > > If we used expandable hash tables, then I
> > >think it would be amortized O(n) and with the constant factor larger.
> > 
> > Nope.
> > Our string hash function is O(N) right now (as are most). Hash tables 
> > are only O(N) when the hash function is O(1).
> > 
> > Now, if you have it only hash the first x characters, you can make your 
> > hash table O(N) again, with the x as the constant. Of course, if only 
> > hashing the first x characters causes tons of hash conflicts, it's not 
> > going to make your hash table very fast.
> 
> Wait a second... aren't you switching N's on us?  
Yes, but this just makes it O(MN), where M is your string hash time, N is 
your number of global symbols.

Although M is not likely to approach N, it could happen on *very* long 
mangled names (i've seen some > 500 characters, in applications with 
around that many *global* symbols), or easily if you index on demangled 
names.

>  is the number of
> global symbols.  We're talking about the total time of adding all
> symbols to the table.
I'm quite aware.

>  Hashing a string is "effectively" constant time,
It's not.
> because all string lengths are "small".
Bullshit.

libstdc++, fer instance, has an average symbol length of 42 (mangled).
Longest is 131 characters.

Demangled, the average is 86. 
Longest is 1116.

> 
> > >(But, I think, not larger in a way that would make a difference.)  I'm
> > >curious about how often the "amortized" bit would lead to strange
> > >hiccups, but I don't think that's a big deal.
> > >
> > >But for skip lists, wouldn't it be something like O(n log n)?  If so,
> > >that's an issue we have to consider.
> > 
> > Put it in perspective.
> > for 1 billion symbols, n is 29.89.
> > for 1 million symbols, n is 19.93.
> 
> You mean "log n", of course.
yup.

> 
>