From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-43258-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 5324 invoked by alias); 13 Mar 2006 20:19:12 -0000
Received: (qmail 5305 invoked by uid 22791); 13 Mar 2006 20:19:11 -0000
X-Spam-Check-By: sourceware.org
Received: from skewer.dreamhost.com (HELO skewer.dreamhost.com) (64.111.107.13)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 13 Mar 2006 20:19:06 +0000
Received: from linux-009002219123.watson.ibm.com (yktgi01e0-s4.watson.ibm.com [129.34.20.23]) 	by skewer.dreamhost.com (Postfix) with ESMTP id 10F8F7C81E; 	Mon, 13 Mar 2006 12:19:03 -0800 (PST)
Subject: Re: linker debug info editing
From: Daniel Berlin <dberlin@dberlin.org>
To: Jim Blandy <jimb@red-bean.com>
Cc: binutils@sourceware.org, gdb-patches@sourceware.org
In-Reply-To: <1142217868.20401.15.camel@linux.site>
References: <20060310124921.GN6777@bubble.grove.modra.org> 	 <8f2776cb0603101744w3dd59741s4ad8e17b7069a6fa@mail.gmail.com> 	 <1142217868.20401.15.camel@linux.site>
Content-Type: text/plain
Date: Tue, 14 Mar 2006 15:10:00 -0000
Message-Id: <1142281149.20401.84.camel@dyn9002219123>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2006-03/txt/msg00181.txt.bz2


> I spent a large amount of time when we were implementing
> -feliminate-dwarf2-dups measuring the cost of various schemes for
> deciding what to try to split and what not.

Oh, i forgot the other half of problems.

If you don't do something like split every die, then you get into things
where you have to name the linkonce sections by the content (using md5
hashes or something), so that you'd don't accidently eliminate one set
of dies that really isn't an exact duplicate of some other.

Once you do this you get hit with ordering issues if the include files
in something else are in a slightly different order, and this causes the
md5's to be different.

*or* you hit the problem that you may not split exactly the same (due to
ordering, more dies, etc), and thus, the dies in one split CU may be
different than another, and thus, will not be commoned (because they
will have different linkonce section names from the content hash)

If you do something obvious like split every type/subprogram die into
it's own section, and linkonce those, the cost in .o files is enormous.

Then, for each type DIE, you have:

12 bytes for a debug_info header
~10-20 bytes of extra overhead from using dw_form_refaddr references
(assuming it is referenced ~3-5 times)
30 bytes for the new section name (.debug_info.linkonce.<16 byte md5>)
10+ bytes (i forget the number) for the extra section table entry you've
added
10+ bytes for the extra symbol you need to reference the thing
everywhere else

It ends up being about 100 bytes per die you put into it's own CU

If you have a C++ app, and 5000 types, you've just wasted 500k.
I had tried this scheme once, and it usually doubled (or more) the size
of the .o files.

IOW, it's always a better idea to eliminate in the linker if possible,
which has *none* of these issues.

It's hard work, but straightforward there if you have all the info you
need.