From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16120 invoked by alias); 24 Sep 2002 16:33:28 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 16113 invoked from network); 24 Sep 2002 16:33:28 -0000 Received: from unknown (HELO jackfruit.Stanford.EDU) (171.64.38.136) by sources.redhat.com with SMTP; 24 Sep 2002 16:33:28 -0000 Received: (from carlton@localhost) by jackfruit.Stanford.EDU (8.11.6/8.11.6) id g8OGXQT08139; Tue, 24 Sep 2002 09:33:26 -0700 X-Authentication-Warning: jackfruit.Stanford.EDU: carlton set sender to carlton@math.stanford.edu using -f To: Daniel Berlin Cc: Daniel Jacobowitz , Jim Blandy , gdb@sources.redhat.com Subject: Re: suggestion for dictionary representation References: <6FDBFE18-CF55-11D6-BA45-000393575BCC@dberlin.org> From: David Carlton Date: Tue, 24 Sep 2002 09:33:00 -0000 In-Reply-To: <6FDBFE18-CF55-11D6-BA45-000393575BCC@dberlin.org> Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2002-09/txt/msg00378.txt.bz2 On Mon, 23 Sep 2002 20:34:50 -0400, Daniel Berlin said: >> I'm also curious about how it would affect the speed of reading in >> symbols. Right now, that should be O(n), where n is the number of >> global symbols, right? >> If we used expandable hash tables, then I think it would be >> amortized O(n) and with the constant factor larger. > Our string hash function is O(N) right now (as are most). Hash > tables are only O(N) when the hash function is O(1). [ Here, of course, my 'n' is the number of global symbols, and Daniel's 'N' is the maximum symbol length. ] This is true, but I'm not sure that it's relevant to this sort of theoretical analysis. After all, skip lists depend on N, as well: they put symbols in order, and the amount of time to do that depends on the length of the symbols. And it's entirely reasonable to think of 'N' as a constant. Or perhaps two constants: one for C programs with short names, one for C++ programs with long names. (And I'm not really sure that the C++ names will ultimately turn out to be that much longer: once the proper namespace support has been added, then looking up a C++ name will probably be a multistep process (looking up pieces of the demangled name in turn), and for each those steps, we'll be looking at a name that will be of the appropriate length for a C program.) But even if we consider N to be a constant, your broader point stands: the constant factors that different algorithms differ by are important, and in practice large constants can have more of an affect than logarithmic terms. Fortunately, one of the advantages of the refactoring that I'm doing right now is that it'll be easy to drop in different dictionary implementations for testing purposes: it should boil down to writing the code that we'd have to do to get skip list support anyways, changing one or two function calls, and recompiling. David Carlton carlton@math.stanford.edu