From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Berlin <dan@cgsoftware.com>
To: Daniel Berlin <dan@cgsoftware.com>
Cc: jhaller@lucent.com, jtc@redback.com, cgf@redhat.com, gdb@sources.redhat.com, binutils@sources.redhat.com
Subject: Re: stabs vs. dwarf-2 for C programs
Date: Fri, 13 Apr 2001 22:56:00 -0000
Message-id: <m2puegqb1q.fsf@dynamic-addr-83-177.resnet.rochester.edu>
References: <5mwv8pzgvt.fsf@jtc.redback.com> <20010412221742.A22383@redhat.com> <5mg0fdzg2t.fsf@jtc.redback.com> <m2snjdy07j.fsf@dynamic-addr-83-177.resnet.rochester.edu> <3AD70931.CF488A07@lucent.com> <m2eluwn8fj.fsf@dynamic-addr-83-177.resnet.rochester.edu> <sisnjctmmn.fsf@daffy.airs.com> <m266g8s1iy.fsf@dynamic-addr-83-177.resnet.rochester.edu> <sieluwtbbr.fsf@daffy.airs.com> <m2u23sqdm8.fsf@dynamic-addr-83-177.resnet.rochester.edu>
X-SW-Source: 2001-04/msg00107.html

Daniel Berlin <dan@cgsoftware.com> writes:

> Ian Lance Taylor <ian@zembu.com> writes:

> Correct.
> 
> It also doesn't support a lot of things that are nice for disk i/o
> dominated programs like linking. These days, they are dominated by
> seek time, and we do *tons* of seeks in bfd.  We'd be better off using
> mmap if possible (file windows aren't the answer, just map the file
> when open, unmap when close, and if a flag in the bfd structure says
> it's mapped, use the right memory functions).
> 
> I'm pretty sure this alone will cut linking time by a large factor,
> based on the number of seeks and the amount of time they are taking,
> in a few tests i just did.

Oh, and i know about the BFD_IN_MEMORY stuff, it just doesn't seem to
work well compared to just mmapping the same files. Probably because
mmap is a lot more optimized than just reading the whole file into
memory at once.

Actually, it's also because we do so many out of memory writes, reads, and
seeks, that it doesn't help us as much as it could.

For instance, linking gdb, which is one archive, 8 objects, requires
5000 out of memory seeks (ie fseeks), 7532 out of memory reads of
random sizes, at random offsets, and 28000 out of memory writes.

Of course, linking gdb is barely disk i/o bound, the numbers get much worse
worse with large programs.
The worst part about the reads is we do tons of small reads (almost
all of the reads are 40 bytes or less), 80% being followed (IE we do 8
40 byte reads, and a seek) by tons of seeks to positions probably just
outside the buffering range, and then reads.

After a lot of that, we start seeking back and forth to various
places, doing 3-4 reads of decreasing order (50k, 7k, 2k, 30 bytes),
each of the reads followed by a seek to some random place.

Then we start doing
read
seek
write
seek
For the majority of the reads and writes (going by total amount of
reading/writing/seeking), usually with matched reads and write sizes.
Each of these 2 seek/1 read/1 write combos could be replaced
with a single memcpy using an mmap'd implementation (at best, this is
what your buffering will give you anyway, with more overhead).

Eventually, we get to writing the generated parts of output file.
The worst part about the writes is that we are doing tons of very
small writes, interspersed at random points with seeks (for alignment
reasons, i guess).

Spending an hour hacking up a horrid implementation of "always use mmap"
in bfd (IE not submittable), on large programs, i decreased the link
time by a lot, and you can hear the difference.  :)


> 
> > 
> > Ian
> 
> -- 
> I went to the cinema, and the prices were:  Adults $5.00,
> children $2.50.  So I said, "Give me two boys and a girl."

-- 
I had to stop driving my car for a while...  The tires got
dizzy.