From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Berlin <dan@cgsoftware.com>
To: gdb-patches@sources.redhat.com, Andrew Cagney <ac131313@cygnus.com>, Jim Blandy <jimb@cygnus.com>
Subject: Re: PATCH: fail to improve psymtab memory consumption
Date: Mon, 23 Jul 2001 22:15:00 -0000
Message-id: <87zo9uaow2.fsf@cgsoftware.com>
References: <20010720212013.7DA695E9D8@zwingli.cygnus.com> <87lmljdz4g.fsf@cgsoftware.com> <nppuausvng.fsf@zwingli.cygnus.com> <87d76uerh8.fsf@cgsoftware.com> <3B5CD3B6.9060106@cygnus.com>
X-SW-Source: 2001-07/msg00589.html

Andrew Cagney <ac131313@cygnus.com> writes:

>>>> I actually use mmap when possible, but that's for speed, rather than
>>>> memory savings.
>>
>>> Is it really faster?
>> Yes, because it saves buffer copying (I think, it's 2am, which is
>> about the time i usually say something stupid, so feel free to smack
>> me around if i'm wrong)
>> (You don't have to dandy about copying buffers into the right
>> place. With an mmap'd area, it's already *in* the right place.)
>> Nathan myers confirms my suspicion, as well.
>> http://pgsql.profnet.pl/mhonarc/pgsql-hackers/2001-05/msg01233.html
>> "
>> Using mmap() in place of disk read() almost always results in enough
>> performance improvement to make doing so worth a lot of disruption."
>> "
> 
> 
> Does this require serial or random file i/o?
It's actually not going to matter, it'll *always* be faster.
You can't make fread faster than mmap, you could make fread use mmap
in special cases, but they don't happen enough in practice to make
fread fast enough to beat mmap. 
With memory latency *increasing* instead of decreasing, this will only
become more true, since mmap requires half the memory traffic, at
worst.


> 
> 
>> Everything i can pull up on google says mmap is so much faster than
>> malloc+fread that it's not even funny, which means it's not just that
>> it feels faster.  On a 100 meg debug info file, you can easily notice
>> the difference.
> 
> 
> What the heck.  I'll think out loud.
> 
> The dwarf2 sections are parsed serially, correct?

Err, some yes, some no.
And for each different type of section, you have to do random I/O to
get to info about the section.

They also aren't parsed serially in terms of location in the file
(Only in terms of processing an entire CU at a time, and all the info
therein, in one swoop.) for symtabs, only for psymtabs.
Frequently, the order in which the psymtabs get converted causes
non-serial access.
Actually, i should probably rephrase that.
The only time dwarf2 is parsed serially outside of psymtab generation
(which is fast anyway), is by dumb luck.

>   That means either:
> 
> 	o	the entire section is read in (a large
> 		slow file cache thrashing copy) and then
> 		parse that buffer

> 
> 	o	the section is mmap'd - gdb's memory image
> 		grows by a large amount

Except, this memory is shared.
Important difference.
(It means you could run two GDB's, and if the debug info sections were
mmapped, you wouldn't use twice the memory on it)
> 
> 	o	the section is read, bit by bit, on demand
> 		using FILE and the hosts buffer cache.
> 		Some OS's even implement this as a mmap().
> 

None of these is true for the new dwarf2 reader.

If you change "section" to "portion of a section" it might be truer.

For the old dwarf2 reader, the first is true, and it's slow as all get
out for large files.

mmap also gives you better swap behavior.
it *knows* it's mapped to a file, it *knows* what parts were recently
accessed, and *knows* therefore what to swap out for that file.

As opposed to having to do it based on the memory patterns of the
entire application, which may cause bad swap behavior (read a
little part of a buffer containing the entire section, do some
processing that causes tons of other memory accesses, repeat, etc).

You can see this taking a large (> 100 meg) amount of debug info that
is not near as uncommon as it used to be, and trying to debug on a
workstation with 256 meg of memory.

Almost every OS i know of will swap out the wrong part of the file
(the part we're gonna need to process next) because it had no clue
what we might want next for that file, only what we needed next for
that application, which doesn't match up.
I'm ignoring the fact that we shouldn't be reading 100 meg buffer and
keeping it live for a *very* long period of time, anyway.
mmap made gdb almost usable for debugging a 100 meg debug info
executable on a 256 meg machine (GDB used about 300 meg of memory).
Where usable is defined as "Only swaps a ton when you are moving from
module to module, not all the time".


> I don't know but back in the good old days CPU's and disks were
> vaguely comparable.  How a days, the disks are the same speed but the
> CPU's are fasater.  Simple FILE I/O could prove faster than you're
> expecting.

I tried it, it's not.
Try the new dwarf2 reader on a large file on a system without MMAP (or
edit config.h to #undef HAVE_MMAP and HAVE_GETPAGESIZE, so the #if's
don't get it).
Then try it with it.

If you look at the code, you'll note this is a perfectly fair
benchmark, since i'm not pulling any tricks, just using a different
method of getting a memory buffer for a part of a section (With mmap,
it mmap's it, without it, it fread's it).


> 
> 	Andrew
> 
> 

-- 
"Every so often, I like to stick my head out the window, look up,
and smile for a satellite picture.
"-Steven Wright