From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29545 invoked by alias); 13 Mar 2006 02:44:25 -0000 Received: (qmail 29532 invoked by uid 22791); 13 Mar 2006 02:44:24 -0000 X-Spam-Check-By: sourceware.org Received: from ladle.dreamhost.com (HELO ladle.dreamhost.com) (205.196.219.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 13 Mar 2006 02:44:23 +0000 Received: from [192.168.1.102] (ool-44c68682.dyn.optonline.net [68.198.134.130]) by ladle.dreamhost.com (Postfix) with ESMTP id C3289129A84; Sun, 12 Mar 2006 18:44:21 -0800 (PST) Subject: Re: linker debug info editing From: Daniel Berlin To: Jim Blandy Cc: binutils@sourceware.org, gdb-patches@sourceware.org In-Reply-To: <8f2776cb0603101744w3dd59741s4ad8e17b7069a6fa@mail.gmail.com> References: <20060310124921.GN6777@bubble.grove.modra.org> <8f2776cb0603101744w3dd59741s4ad8e17b7069a6fa@mail.gmail.com> Content-Type: text/plain Date: Mon, 13 Mar 2006 12:53:00 -0000 Message-Id: <1142217868.20401.15.camel@linux.site> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-03/txt/msg00176.txt.bz2 On Fri, 2006-03-10 at 17:44 -0800, Jim Blandy wrote: > After you've chosen dies to delete, how do you deal with other dies > that refer to the deleted dies? I'm not talking about parents; I'm > talking about attributes whose form is DW_FORM_ref*. The only correct answer to this is "rewrite all the references all starting from scratch" :P You could track it, but the gap tracking you'd have to do is pretty annoying. I had to do this once for a dwarf2 duplicate die eliminator. SGI's linker eliminated duplicate dies at link time, IIRC. > . > > I think the information we need to do this reduction correctly isn't > available at the level you're working at. linkonce sections aren't > really deleted; they're unified. The data in them doesn't go away; > equivalent data from elsewhere is used instead. > > I tend to think that having the compiler divide the information into > separate compilation units, as Jim suggests, is the only way to go > here. In that scenario, inter-CU references will use symbols to refer > to their targets; after choosing which instance of the linkonce > section to keep, you should still have definitions for all the symbols > the other dies' relocs refer to. > > As Daniel says, the GDB-related reasons for avoiding this solution are > long gone. > The problem with the inter-CU references and section splitting scheme (IE -feliminate-dwarf2-dups) is that it has some greater constant overhead compared to straight elimination because ref_addr forms are have larger values, plus the different number of sections. When you have 80 meg of debug info, referencing with the absolute offset from the beginning of .debug_info ends up being 4 bytes, while otherwise it would have been 1 for an in-cu reference. This adds up quite quickly. For a lot of files, we lost >8-10% of space savings due to overhead. In cases where you have < 10 meg of debug info, it sometimes even lost out to not eliminating duplicates at all (even though there were, in fact, lots of duplicates). Also, deciding what to put into the split sections is hard. You can't just split every type and program into a separate CU, and ref_addr everything. The overhead of doing so is enormous. I spent a large amount of time when we were implementing -feliminate-dwarf2-dups measuring the cost of various schemes for deciding what to try to split and what not. I came to the conclusion that splitting sections should only really be used if you can't have something that just goes through and eliminates all duplicates by understanding and rewriting the dwarf2 info all at once at link time.