Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Daniel Berlin <dan@cgsoftware.com>
To: Jim Blandy <jimb@zwingli.cygnus.com>
Cc: gdb-patches@sources.redhat.com
Subject: Re: [WIP] New dwarf2 reader - updated 07-02-2001
Date: Tue, 31 Jul 2001 13:45:00 -0000	[thread overview]
Message-ID: <229924053.996597936@[192.168.0.106]> (raw)
In-Reply-To: <npg0bc96ou.fsf@zwingli.cygnus.com>

--On Tuesday, July 31, 2001 3:36 PM -0500 Jim Blandy 
<jimb@zwingli.cygnus.com> wrote:

>
> Daniel Berlin <dan@cgsoftware.com> writes:
>> Jim Blandy <jimb@zwingli.cygnus.com> writes:
>> > - MD5 generates a 128-bit checksum.  Any reason you only use the first
>> >   32 bits in your hash?  It looks like checksum_die and
>> >   checksum_partial_die both cut it off at eight digits.
>> I use the first 64 bits, just like GCC does for checksumming DIE's and
>> duplicate elimination.  Though i'll give you gcc also includes the
>> name attached to the die. I'll happily make the two algorithms
>> *exactly* the same if you like.
>
> Eight hex digits * 4 bits per hex digit is 32 bits.  No?
>
>     static inline char *
>     checksum_die (struct die_info *die)
>     {
>       struct md5_ctx ctx;
>       char *retval;
>       int i;
>       unsigned char checksum[16];
>       md5_init_ctx (&ctx);
>       process_full_die (die, &ctx);
>       md5_finish_ctx (&ctx, checksum);
>       retval = xcalloc (1, 9);
>       for (i = 0; i < 4; ++i)
>         sprintf (retval + (i * 2), "%.2x", checksum[i]);
>       retval[8] = 0;
>       return retval;
>     }
>
> The "MD5 never clashes" thesis is okay by me.  But to say that,
> therefore, the first 32 bits of MD5 never clash, is not okay.
>
> Using a function yielding n distinct hash values, the probability is
> 1/n that they clash, thus 1-1/n that they don't clash.  For p
> independent pairs, the chances are (1-1/n)^p that none clash.  For a
> set of d dies, there are d(d-1) pairs.  So in our case, using 32 bits
> of MD5 for 10000 distinct dies, the chance of no clashes are
> (1-1/(2^32))^(10000(10000-1)), or 0.977.  Thus a 2.3% chance of a
> clash.  This sucks.
>
> Using 64 bits, as you intended, is much better, but why not use the
> full 128?  Honestly, why play this kind of game at all?
Because it's faster than trying to keep all the die data.
>  It's not that
> hard to do the job perfectly --- just use the significant die contents
> themselves as the tree key.  No collisions, no hash function, simpler.
>

Um, think about how much memory that would use.
We'd have to keep copies of die contents around.

>> > - Why are you using a splay tree keyed by MD5 hash values, instead of
>> >   a hash table?  The only advantage of a splay over a hash table is
>> >   ordered retrieval, but your keys don't have any meaningful order.
>> I hate the hash table interface for libiberty, it's annoying to use,
>> IMHO.
>> Other than that, no reason. I was meaning to make a nice generic
>> dictionary interface, and talked with DJ about whether i could just
>> use libdict, but since it's not GPL, and i don't have time to
>> duplicate it's nice interface right now, i just went with what was
>> easiest to program.
>
> Okay.
>
>> > - Assuming MD5 is perfect, you don't checksum all attribute forms
>> >   (DW_FORM_flag, for example).  This means that you can get false
>> >   duplicates, if two dies differ only in (say) a flag's value.  How is
>> >   read_comp_unit_dies set up to tolerate that?
>> I only checksummed the  stuff gcc generates (since i based the
>> checksumming code on gcc's), and I forgot that gcc
>> doesn't generate some forms.
>
> GCC does generate DW_FORM_flag:
>
> 	.byte	0x2	; uleb128 0x2; (abbrev code)
> 	.byte	0x2e	; uleb128 0x2e; (TAG: DW_TAG_subprogram)
> 	.byte	0x1	; DW_children_yes
> 	.byte	0x1	; uleb128 0x1; (DW_AT_sibling)
> 	.byte	0x13	; uleb128 0x13; (DW_FORM_ref4)
> 	.byte	0x3f	; uleb128 0x3f; (DW_AT_external)
> 	.byte	0xc	; uleb128 0xc; (DW_FORM_flag)
> 	.byte	0x3	; uleb128 0x3; (DW_AT_name)
> 	.byte	0x8	; uleb128 0x8; (DW_FORM_string)
> 	.byte	0x3a	; uleb128 0x3a; (DW_AT_decl_file)
> 	.byte	0xb	; uleb128 0xb; (DW_FORM_data1)
> 	.byte	0x3b	; uleb128 0x3b; (DW_AT_decl_line)
> 	.byte	0xb	; uleb128 0xb; (DW_FORM_data1)
> 	.byte	0x27	; uleb128 0x27; (DW_AT_prototyped)
> 	.byte	0xc	; uleb128 0xc; (DW_FORM_flag)
> 	.byte	0x49	; uleb128 0x49; (DW_AT_type)
> 	.byte	0x13	; uleb128 0x13; (DW_FORM_ref4)
> 	.byte	0x11	; uleb128 0x11; (DW_AT_low_pc)
> 	.byte	0x1	; uleb128 0x1; (DW_FORM_addr)
> 	.byte	0x12	; uleb128 0x12; (DW_AT_high_pc)
> 	.byte	0x1	; uleb128 0x1; (DW_FORM_addr)
> 	.byte	0x40	; uleb128 0x40; (DW_AT_frame_base)
> 	.byte	0xa	; uleb128 0xa; (DW_FORM_block1)
> 	.byte	0x0
> 	.byte	0x0
>
>> > - In read_comp_unit_dies, when you find a duplicate die, you skip to
>> >   its sibling.  What if the parent die is identical, but the children
>> >   dies differ?
>>
>> I don't believe this is possible in any language.
>
> How about this:
>
> namespace X {
>   namespace A {
>     int m, n;
>   }
> }
>
> namespace Y {
>   namespace A {
>     int o, p;
>   }
> }
>
> The dies for the two 'A' namespaces have the name DW_AT_name
> attribute, but they're clearly different.
>
This won't do it, they'll have different decl line attributes.
And we only eliminate at the top level anyway.

namespace A
{
	int o, p;
}
namespace A
{
	int m, n;
}
won't screw us either, since the decl line number will still differ.
and, if you implemented it according to spec, one would be a different tag 
than the other.

> Just in principle, this seems sloppy.  Dies are just arbitrary tree
> nodes.  There's no reason to assume that if two parent nodes are
> identical, we can skip the second one and all its children.
It's not sloppy.
Elimination at the top level is normal in other debuggers i know of that do 
elimination.






  reply	other threads:[~2001-07-31 13:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-07-02 10:34 Daniel Berlin
2001-07-02 10:56 ` Daniel Berlin
2001-07-30 17:07 ` Jim Blandy
2001-07-30 23:06   ` Daniel Berlin
2001-07-31 13:35     ` Jim Blandy
2001-07-31 13:45       ` Daniel Berlin [this message]
2001-07-31 16:11         ` Jim Blandy
2001-07-31 16:31           ` Daniel Berlin
2001-07-31 15:16 David B Anderson
2001-07-31 23:13 ` Daniel Berlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='229924053.996597936@[192.168.0.106]' \
    --to=dan@cgsoftware.com \
    --cc=gdb-patches@sources.redhat.com \
    --cc=jimb@zwingli.cygnus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox