From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Berlin To: jhaller@lucent.com Cc: jtc@redback.com, cgf@redhat.com, gdb@sources.redhat.com, binutils@sources.redhat.com, dan@cgsoftware.com Subject: Re: stabs vs. dwarf-2 for C programs Date: Fri, 13 Apr 2001 23:31:00 -0000 Message-id: References: <5mwv8pzgvt.fsf@jtc.redback.com> <20010412221742.A22383@redhat.com> <5mg0fdzg2t.fsf@jtc.redback.com> <3AD70931.CF488A07@lucent.com> X-SW-Source: 2001-04/msg00108.html Ian Lance Taylor writes: > Daniel Berlin writes: > > > > > With things like dwarf2 dupe elimination implemented in compilers > > > > using linkonce sections, and relocations on debug info in object > > > > files, leaving the debugging info in the object files, is simply > > > > impossible. It's a hack. Nothing more or less. > > > > > > Huh? Leaving the debugging information in object files is clearly not > > > impossible. > > > > Of course not, unless you perform elimination of debug info, in which > > case, it can be impossible without having to recompile possibly every > > object, every time (see below). > > If you are leaving debug info in object files, I don't understand why > you would do elimination of debug info. Errr, because the memory usage in the debugger will jump otherwise. Not just by a small amount, either. Assuming you have a lot of common headers, which most programs do, at least for the average C++ programs (where 95% of the debug info comes from headers, rather than code), you'll effectively multiply the memory usage of the debugger by the number of object files you have. This is not a good idea at all. However, even with C, it's still a killer. JT just mentioned without dwarf2, his debug executable builds are 67 meg, and with dwarf2, 167 meg. this is dwarf2 without elimination, and before linking, dwarf2 object files are usually 70-80% the size of the stabs objects (and dominate the size of an object file). So while he could probably debug in about 30-200 meg of memory before (varying on what part he's debugging), he now debugs in 130-400 megs. Not a pretty sight. Unless you suggest we do the elimination in the debugger? I tried this, and it worked fine on the memory issue, but people just complained about the executable size anyway. :) > The goal of leaving debug > info in object files is to speed up the linker at the expense of the > debugger. The usual purpose of eliminating debug info is to shrink > the size of the executable (which does not apply) and to speed up the > debugger at the expense of the linker (which does not apply). It does if you are talking about going from using 200 meg of memory in gdb to using 600 meg of memory in GDB, it does apply. Remember, link times are only long on large projects anyway, so we aren't talking about a small amount of debug info. > > Is there some other reason to eliminate debug info? Yes, to reduce debugger memory usage. > > > > You need a minimal amount of debugging information in the > > > executable: enough for it to determine which object file to check for > > > information about a particular address range. That is not impossible. > > > The debugger needs to process relocations at debug time. That is not > > > impossible. > > > > > > Do you have a valid reason to reject this approach? > > > > Yes. It requires the debugger to become a linker. > > You haven't removed the time necessary, you've simply moved it to > > another place, excluding disk write time from the end of link pass. > > Well, yes. That's the whole point. The linker is run more often than > the debugger. Depends, I guess. I find i run the debugger as much as the linker, but i deal with either gcc, gdb, or hairy multithreaded programs that seem to not do the locking they need to :) > Quite a bit more often, in fact. It can be a good idea > to speed up the linker at the expense of the debugger. Yes, to a certain degree. But I think you'll end up raping the debugger :). If you don't, i'm all for it. > > It doesn't require the debugger to become a full-fledged linker, of > course. It only requires the debugger to process the simple > relocations which can appear in debugging sections. This is fairly > easy, and in fact BFD provides a function which does it. Sure. > > [ I omit further discussions about debug info elimination, since I > don't understand why you would do it when leaving debug info in > object files. ] See the above. Think about, for instance, the mozilla guys having to need 50x as much memory to debug programs, though it links 10x faster. It's probably not a good tradeoff in this case. I can't think of many cases where it *is* a good tradeoff, unless you almost never debug, in which case, why the heck are you doing debug builds anyway? > > > > You need to do the same amount of linking, you've just spread the time > > around, and removed the time necessary to write (probably at the cost > > of memory) the final executable. > > > > In fact, that's all this approach saves. The cost of writing. We still > > eat the cost of reading, and the cpu usage, and we have to perform the > > linking/relocation somewhere, so we eat as much memory, as you would disk. > > No. The approach saves the cost of reading the debug information from > the files, it saves the cost of processing the debug information for > duplicate information, it saves the cost of processing debug > information relocations, and, as you say, it saves the cost of writing > the debug information. This has to be much lower than the cost of reading and writing other sections. There is nothing much to process. I would imagine that debug info sections probably take the least amount of time to process of all the sections, and that section time processing is dominated by a few section types that require more than just "read huge chunk->process huge chunk->write huge chunk whenever we flush for some reason". I could be completely off the mark, of course. I just don't see the debug info as the big eater of disk i/o, when disk i/o is dominated by seeks. We can easily read 16 meg a second sequentially, so even a debug info section 16 times larger (which wouldn't be uncommon) than one requiring what amounts mostly to non-sequential i/o probably goes at the same speed. > > > It also doesn't support a lot of things that are nice for disk i/o > > dominated programs like linking. These days, they are dominated by > > seek time, and we do *tons* of seeks in bfd. We'd be better off using > > mmap if possible (file windows aren't the answer, just map the file > > when open, unmap when close, and if a flag in the bfd structure says > > it's mapped, use the right memory functions). > > > > I'm pretty sure this alone will cut linking time by a large factor, > > based on the number of seeks and the amount of time they are taking, > > in a few tests i just did. > > It's possible. Around 1997 or so I tried using mmap as you suggest. > It actually made the linker slower on the systems I tested it on, > including Solaris and on the Linux kernel, so I dropped the effort. This isn't surprising from 1997, mmap in Linux then was horribly bad, and reports i read say disk's have increased in performance (but not seek time) at about 60% a year since 1995. At some point between 1995 and 2001, disk throughput became irrelevant, and seek time started to dominate. PRML was just starting to be generally used in hard drives in 1997, too, and MR heads were in somewhat short supply (WD didn't introduce a drive using MR heads until the end of 1997), and both of these increased throughput a large amount (though the main reason for doing it was the increased density, of course). > But, as you say, times change, and maybe now it would indeed be > faster. It's an easy optimization in BFD; why don't you try it? See the next post. I'll probably do some more comprehensive tests tomorrow. > > Ian -- I like to fill my tub up with water, then turn the shower on and act like I'm in a submarine that's been hit.