From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Berlin <dan@cgsoftware.com>
To: jhaller@lucent.com
Cc: jtc@redback.com, cgf@redhat.com, gdb@sources.redhat.com, binutils@sources.redhat.com, dan@cgsoftware.com
Subject: Re: stabs vs. dwarf-2 for C programs
Date: Fri, 13 Apr 2001 23:31:00 -0000
Message-id: <m2lmp4q9oc.fsf@dynamic-addr-83-177.resnet.rochester.edu>
References: <5mwv8pzgvt.fsf@jtc.redback.com> <20010412221742.A22383@redhat.com> <5mg0fdzg2t.fsf@jtc.redback.com> <m2snjdy07j.fsf@dynamic-addr-83-177.resnet.rochester.edu> <3AD70931.CF488A07@lucent.com> <m2eluwn8fj.fsf@dynamic-addr-83-177.resnet.rochester.edu> <sisnjctmmn.fsf@daffy.airs.com> <m266g8s1iy.fsf@dynamic-addr-83-177.resnet.rochester.edu> <sieluwtbbr.fsf@daffy.airs.com> <m2u23sqdm8.fsf@dynamic-addr-83-177.resnet.rochester.edu> <sisnjcrrbi.fsf@daffy.airs.com>
X-SW-Source: 2001-04/msg00108.html

Ian Lance Taylor <ian@zembu.com> writes:

> Daniel Berlin <dan@cgsoftware.com> writes:
> 
> > > > With things like dwarf2 dupe elimination implemented in compilers
> > > > using linkonce sections, and relocations on debug info in object
> > > > files, leaving the debugging info  in the object files, is simply
> > > > impossible.  It's a hack. Nothing more or less. 
> > > 
> > > Huh?  Leaving the debugging information in object files is clearly not
> > > impossible.  
> > 
> > Of course not, unless you perform elimination of debug info, in which
> > case, it can be impossible without having to recompile possibly every
> > object, every time (see below).
> 
> If you are leaving debug info in object files, I don't understand why
> you would do elimination of debug info. 

Errr, because the memory usage in the debugger will jump otherwise.

Not just by a small amount, either.   Assuming you have a lot of
common headers, which most programs do, at least for the average C++
programs (where 95% of the debug info comes from headers, rather than
code), you'll effectively multiply the memory usage of the debugger 
by the number of object files you have.
This is not a good idea at all.

However, even with C, it's still a killer.

JT just mentioned without dwarf2, his debug executable builds are 67 meg, and
with dwarf2, 167 meg.

this is dwarf2 without elimination, and before linking, dwarf2 object
files are usually 70-80% the size of the stabs objects (and dominate
the size of an object file). So while he
could probably debug in about 30-200 meg of memory before (varying on
what part he's debugging), he now debugs in 130-400 megs.
Not a pretty sight.

Unless you suggest we do the elimination in the debugger?
I tried this, and it worked fine on the memory issue, but people just
complained about the executable size anyway. :)

>  The goal of leaving debug
> info in object files is to speed up the linker at the expense of the
> debugger.  The usual purpose of eliminating debug info is to shrink
> the size of the executable (which does not apply) and to speed up the
> debugger at the expense of the linker (which does not apply).
It does if you are talking about going from using 200 meg of memory in
gdb to using 600 meg of memory in GDB, it does apply.

Remember, link times are only long on large projects anyway, so we
aren't talking about a small amount of debug info.

> 
> Is there some other reason to eliminate debug info?

Yes, to reduce debugger memory usage.
> 
> > > You need a minimal amount of debugging information in the
> > > executable: enough for it to determine which object file to check for
> > > information about a particular address range.  That is not impossible.
> > > The debugger needs to process relocations at debug time.  That is not
> > > impossible.
> > > 
> > > Do you have a valid reason to reject this approach?
> > 
> > Yes. It requires the debugger to become a linker.
> > You haven't removed the time necessary, you've simply moved it to
> > another place, excluding disk write time from the end of link pass.
> 
> Well, yes.  That's the whole point.  The linker is run more often than
> the debugger. 
Depends, I guess.
I find i run the debugger as much as the linker, but i deal with
either gcc, gdb, or hairy multithreaded programs that seem to not do
the locking they need to :)
>  Quite a bit more often, in fact.  It can be a good idea
> to speed up the linker at the expense of the debugger.

Yes, to a certain degree. But I think you'll end up raping the
debugger :).

If you don't, i'm all for it.

> 
> It doesn't require the debugger to become a full-fledged linker, of
> course.  It only requires the debugger to process the simple
> relocations which can appear in debugging sections.  This is fairly
> easy, and in fact BFD provides a function which does it.

Sure.
> 
> [ I omit further discussions about debug info elimination, since I
>   don't understand why you would do it when leaving debug info in
>   object files. ]

See the above. 

Think about, for instance, the mozilla guys having to need 50x as much
memory to debug programs, though it links 10x faster.

It's probably not a good tradeoff in this case.
I can't think of many cases where it *is* a good tradeoff, unless you
almost never debug, in which case, why the heck are you doing debug
builds anyway?


> 
> 
> > You need to do the same amount of linking, you've just spread the time
> > around, and removed the time necessary to write (probably at the cost
> > of memory) the final executable.
> > 
> > In fact, that's all this approach saves. The cost of writing. We still
> > eat the cost of reading, and the cpu usage, and we have to perform the
> > linking/relocation somewhere, so we eat as much memory, as you would disk.
> 
> No.  The approach saves the cost of reading the debug information from
> the files, it saves the cost of processing the debug information for
> duplicate information, it saves the cost of processing debug
> information relocations, and, as you say, it saves the cost of writing
> the debug information.

This has to be much lower than the cost of reading and writing other
sections. There is nothing much to process. I would imagine
that debug info sections probably take the least amount of time to
process of all the sections, and that section time processing is
dominated by a few section types that require more than just "read
huge chunk->process huge chunk->write huge chunk whenever we flush for
some reason".  I could be completely off the mark, of course.

I just don't see the debug info as the big eater of disk i/o, when
disk i/o is dominated by seeks.  We can easily read 16 meg a second
sequentially, so even a debug info section 16 times larger (which
wouldn't be uncommon) than one requiring what amounts mostly to
non-sequential i/o probably goes at the same speed.

> 
> > It also doesn't support a lot of things that are nice for disk i/o
> > dominated programs like linking. These days, they are dominated by
> > seek time, and we do *tons* of seeks in bfd.  We'd be better off using
> > mmap if possible (file windows aren't the answer, just map the file
> > when open, unmap when close, and if a flag in the bfd structure says
> > it's mapped, use the right memory functions).
> > 
> > I'm pretty sure this alone will cut linking time by a large factor,
> > based on the number of seeks and the amount of time they are taking,
> > in a few tests i just did.
> 
> It's possible.  Around 1997 or so I tried using mmap as you suggest.
> It actually made the linker slower on the systems I tested it on,
> including Solaris and on the Linux kernel, so I dropped the effort.

This isn't surprising from 1997, mmap in Linux then was horribly bad,
and  reports i read say disk's have increased in performance (but
not seek time) at about 60% a year since 1995.  At some point between
1995 and 2001, disk throughput became irrelevant, and seek time
started to dominate.   PRML was just starting to be generally used in
hard drives in 1997, too, and MR heads were in somewhat short supply
(WD didn't introduce a drive using MR heads until the end of 1997),
and both of these increased throughput a large amount (though the main
reason for doing it was the increased density, of course).


> But, as you say, times change, and maybe now it would indeed be
> faster.  It's an easy optimization in BFD; why don't you try it?
See the next post.

I'll probably do some more comprehensive tests tomorrow.

> 
> Ian

-- 
I like to fill my tub up with water, then turn the shower on and
act like I'm in a submarine that's been hit.