From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16744 invoked by alias); 24 Sep 2002 04:28:20 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 16737 invoked from network); 24 Sep 2002 04:28:20 -0000 Received: from unknown (HELO mail.cdt.org) (206.112.85.61) by sources.redhat.com with SMTP; 24 Sep 2002 04:28:20 -0000 Received: from dberlin.org (pool-138-88-48-174.res.east.verizon.net [138.88.48.174]) by mail.cdt.org (Postfix) with ESMTP id 7145D49005C; Tue, 24 Sep 2002 00:05:19 -0400 (EDT) Received: from [127.0.0.1] (HELO localhost) by dberlin.org (CommuniGate Pro SMTP 4.0b9a) with ESMTP-TLS id 250167; Tue, 24 Sep 2002 00:28:18 -0400 Date: Mon, 23 Sep 2002 21:28:00 -0000 From: Daniel Berlin To: Daniel Jacobowitz Cc: David Carlton , Jim Blandy , Subject: Re: suggestion for dictionary representation In-Reply-To: <20020924012618.GA11065@nevyn.them.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2002-09/txt/msg00361.txt.bz2 On Mon, 23 Sep 2002, Daniel Jacobowitz wrote: > On Mon, Sep 23, 2002 at 08:34:50PM -0400, Daniel Berlin wrote: > > >I'm also curious about how it would affect the speed of reading in > > >symbols. Right now, that should be O(n), where n is the number of > > >global symbols, right? > > > > > If we used expandable hash tables, then I > > >think it would be amortized O(n) and with the constant factor larger. > > > > Nope. > > Our string hash function is O(N) right now (as are most). Hash tables > > are only O(N) when the hash function is O(1). > > > > Now, if you have it only hash the first x characters, you can make your > > hash table O(N) again, with the x as the constant. Of course, if only > > hashing the first x characters causes tons of hash conflicts, it's not > > going to make your hash table very fast. > > Wait a second... aren't you switching N's on us? Yes, but this just makes it O(MN), where M is your string hash time, N is your number of global symbols. Although M is not likely to approach N, it could happen on *very* long mangled names (i've seen some > 500 characters, in applications with around that many *global* symbols), or easily if you index on demangled names. > is the number of > global symbols. We're talking about the total time of adding all > symbols to the table. I'm quite aware. > Hashing a string is "effectively" constant time, It's not. > because all string lengths are "small". Bullshit. libstdc++, fer instance, has an average symbol length of 42 (mangled). Longest is 131 characters. Demangled, the average is 86. Longest is 1116. > > > >(But, I think, not larger in a way that would make a difference.) I'm > > >curious about how often the "amortized" bit would lead to strange > > >hiccups, but I don't think that's a big deal. > > > > > >But for skip lists, wouldn't it be something like O(n log n)? If so, > > >that's an issue we have to consider. > > > > Put it in perspective. > > for 1 billion symbols, n is 29.89. > > for 1 million symbols, n is 19.93. > > You mean "log n", of course. yup. > >