* Unreliable BFD caching heuristic
@ 2013-11-21 17:39 Luis Machado
2013-11-21 17:58 ` Pedro Alves
2013-12-02 15:42 ` Tom Tromey
0 siblings, 2 replies; 13+ messages in thread
From: Luis Machado @ 2013-11-21 17:39 UTC (permalink / raw)
To: gdb, Tom Tromey, Maciej W. Rozycki
Hi,
Just recently i was chasing a bug in an automated GCC testing
environment that uses GDB to run the programs GCC builds.
Some tests in GCC's testsuite create a number of different binaries with
the same name, one after the other, with different flags.
The bug happens when GDB loads all of those binaries, one at a time
obviously, in a single session, with a pattern of loading, running,
loading, running...
That's when the BFD caching takes effect and starts trying to save
resources based on two pieces of information: file name and the file's
modification timestamp.
In summary, if you are using a reasonably fast build system that can
create multiple binaries per second, you can potentially end up with two
or more binaries with the same file modification timestamp.
If all those things happen at the same time, the BFD machinery will
attempt to use a cached entry to load data from a totally new binary.
From that point onwards, things just go downhill.
This is really a regression compared to older GDB's, and a solution
probably involves improving the matching heuristics, in
gdb/gdb_bfd.c:eq_bfd, with more data.
Experiments show that using the file size in eq_bfd helps a little, but
i've managed to trigger the bug with that as well, plus it does not
address the problem of having multiple BFD's inside an archive.
Since we have hash functions available, Maciej (cc-ed) suggested
creating a hash number based on ELF header information, which sounds
like a good approach.
I can also see how this bug would happen under systems with low
resolution timestamps or with filesystems that do not provide precise
timestamp information.
Do you have any proposals on ways to improve this heuristic?
As an example, here is a list of binaries GCC generates for a testcase
(with unique names so we can compare them side-by-side). Notice their
timestamps.
Size Timestamp Name
107223 2013-11-20 19:53:01.758737300 -0800 20020220-1-10.exe
106727 2013-11-20 19:53:01.758737300 -0800 20020220-1-11.exe
106727 2013-11-20 19:53:01.758737300 -0800 20020220-1-12.exe
107303 2013-11-20 19:53:01.758737300 -0800 20020220-1-13.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-14.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-15.exe
107303 2013-11-20 19:53:02.258781643 -0800 20020220-1-16.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-17.exe
106727 2013-11-20 19:53:02.757667981 -0800 20020220-1-18.exe
107575 2013-11-20 19:53:02.757667981 -0800 20020220-1-19.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-20.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-21.exe
107655 2013-11-20 19:53:02.757667981 -0800 20020220-1-22.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-23.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-24.exe
107655 2013-11-20 19:53:03.258763445 -0800 20020220-1-25.exe
107079 2013-11-20 19:53:03.258763445 -0800 20020220-1-26.exe
107079 2013-11-20 19:53:03.258763445 -0800 20020220-1-27.exe
104930 2013-11-20 19:53:01.258799713 -0800 20020220-1-2.exe
104962 2013-11-20 19:53:01.258799713 -0800 20020220-1-3.exe
105890 2013-11-20 19:53:01.258799713 -0800 20020220-1-4.exe
105170 2013-11-20 19:53:01.258799713 -0800 20020220-1-5.exe
105218 2013-11-20 19:53:01.258799713 -0800 20020220-1-6.exe
116934 2013-11-20 19:53:01.758737300 -0800 20020220-1-7.exe
116214 2013-11-20 19:53:01.758737300 -0800 20020220-1-8.exe
116270 2013-11-20 19:53:01.758737300 -0800 20020220-1-9.exe
105378 2013-11-20 19:52:57.758767853 -0800 20020220-1.exe
Regards,
Luis
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Unreliable BFD caching heuristic
2013-11-21 17:39 Unreliable BFD caching heuristic Luis Machado
@ 2013-11-21 17:58 ` Pedro Alves
2013-11-21 21:58 ` Tom Tromey
2013-11-25 17:45 ` Luis Machado
2013-12-02 15:42 ` Tom Tromey
1 sibling, 2 replies; 13+ messages in thread
From: Pedro Alves @ 2013-11-21 17:58 UTC (permalink / raw)
To: lgustavo; +Cc: gdb, Tom Tromey, Maciej W. Rozycki
On 11/21/2013 05:39 PM, Luis Machado wrote:
> Do you have any proposals on ways to improve this heuristic?
Compare st_ino/st_dev, and don't share if the system doesn't
provide meaningful bfd_stat data?
symfile.c:separate_debug_file_exists does this already,
and then does a CRC check if all else fails. Not sure
whether the CRC part would be a good idea here.
--
Pedro Alves
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Unreliable BFD caching heuristic
2013-11-21 17:58 ` Pedro Alves
@ 2013-11-21 21:58 ` Tom Tromey
2013-11-23 12:14 ` Phi Debian
2013-11-25 17:45 ` Luis Machado
1 sibling, 1 reply; 13+ messages in thread
From: Tom Tromey @ 2013-11-21 21:58 UTC (permalink / raw)
To: Pedro Alves; +Cc: lgustavo, gdb, Maciej W. Rozycki
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:
>> Do you have any proposals on ways to improve this heuristic?
Pedro> Compare st_ino/st_dev, and don't share if the system doesn't
Pedro> provide meaningful bfd_stat data?
Pedro> symfile.c:separate_debug_file_exists does this already,
Pedro> and then does a CRC check if all else fails. Not sure
Pedro> whether the CRC part would be a good idea here.
Using the nanosecond information given by stat would help, at least on
platforms where this exists.
Tom
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Unreliable BFD caching heuristic
2013-11-21 17:58 ` Pedro Alves
2013-11-21 21:58 ` Tom Tromey
@ 2013-11-25 17:45 ` Luis Machado
2013-11-25 18:07 ` Pedro Alves
` (2 more replies)
1 sibling, 3 replies; 13+ messages in thread
From: Luis Machado @ 2013-11-25 17:45 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb, Tom Tromey, Maciej W. Rozycki
On 11/21/2013 03:58 PM, Pedro Alves wrote:
> On 11/21/2013 05:39 PM, Luis Machado wrote:
>> Do you have any proposals on ways to improve this heuristic?
>
> Compare st_ino/st_dev, and don't share if the system doesn't
> provide meaningful bfd_stat data?
>
> symfile.c:separate_debug_file_exists does this already,
> and then does a CRC check if all else fails. Not sure
> whether the CRC part would be a good idea here.
>
I don't think the inode and device information are portable enough for
us to use.
The file CRC seems more appropriate in terms of portability, but we need
to open the bfd, check the CRC and (maybe) close it if we find a cached
entry. Sounds like a potential performance drawback, but it is more
reliable IMO.
We can't rely on the timestamp due to some filesystems having 1 second
or 2 seconds resolution. That doesn't seem enough.
Luis
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Unreliable BFD caching heuristic
2013-11-25 17:45 ` Luis Machado
@ 2013-11-25 18:07 ` Pedro Alves
2013-11-25 18:08 ` Maciej W. Rozycki
2013-11-25 18:21 ` Pedro Alves
2 siblings, 0 replies; 13+ messages in thread
From: Pedro Alves @ 2013-11-25 18:07 UTC (permalink / raw)
To: lgustavo; +Cc: gdb, Tom Tromey, Maciej W. Rozycki
On 11/25/2013 05:45 PM, Luis Machado wrote:
> On 11/21/2013 03:58 PM, Pedro Alves wrote:
>> On 11/21/2013 05:39 PM, Luis Machado wrote:
>>> Do you have any proposals on ways to improve this heuristic?
>>
>> Compare st_ino/st_dev, and don't share if the system doesn't
>> provide meaningful bfd_stat data?
>>
>> symfile.c:separate_debug_file_exists does this already,
>> and then does a CRC check if all else fails. Not sure
>> whether the CRC part would be a good idea here.
>>
>
> I don't think the inode and device information are portable enough for
> us to use.
bfd sharing is a memory optimization. That's why I suggested
to not share if not possible to tell (efficiently and) reliably.
Comparing st_ino/st_dev between two files is the standard check
to use to check whether two files are the same. I don't think we
should penalize systems/filesystems that _can_ do this check in the name
of portability. Only if that fails would we defer to possibly
more expensive fallbacks, like separate_debug_file_exists.
>
> The file CRC seems more appropriate in terms of portability, but we need
> to open the bfd, check the CRC and (maybe) close it if we find a cached
> entry. Sounds like a potential performance drawback, but it is more
> reliable IMO.
Right, that's why I wasn't sure it'd be a good idea. Some sort of
fingerprinting like Maciej suggested has potential for being cheaper.
As usual when considering optimizations, it's a code/memory balance.
We can either always optimize for memory and maybe be slow
sometimes, or always optimize for speed, and use more memory
sometimes. The trick is in finding the sweet spot in between.
> We can't rely on the timestamp due to some filesystems having 1 second
> or 2 seconds resolution. That doesn't seem enough.
--
Pedro Alves
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Unreliable BFD caching heuristic
2013-11-25 17:45 ` Luis Machado
2013-11-25 18:07 ` Pedro Alves
@ 2013-11-25 18:08 ` Maciej W. Rozycki
2013-11-25 18:21 ` Pedro Alves
2 siblings, 0 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2013-11-25 18:08 UTC (permalink / raw)
To: Luis Machado; +Cc: Pedro Alves, gdb, Tom Tromey
On Mon, 25 Nov 2013, Luis Machado wrote:
> > Compare st_ino/st_dev, and don't share if the system doesn't
> > provide meaningful bfd_stat data?
> >
> > symfile.c:separate_debug_file_exists does this already,
> > and then does a CRC check if all else fails. Not sure
> > whether the CRC part would be a good idea here.
> >
>
> I don't think the inode and device information are portable enough for us to
> use.
>
> The file CRC seems more appropriate in terms of portability, but we need to
> open the bfd, check the CRC and (maybe) close it if we find a cached entry.
> Sounds like a potential performance drawback, but it is more reliable IMO.
>
> We can't rely on the timestamp due to some filesystems having 1 second or 2
> seconds resolution. That doesn't seem enough.
The system clock may also have a 1 second resolution only (although these
are quite rare indeed these days) and a timestamp can be interfered with
(unless we check ctime too, but then some filesystems don't store it).
Calculating a CRC of the whole file may be unacceptably slow with large
executablees and slower targets, and may therefore defeat the very purpose
of BFD caching. If we were going down that path, then may I suggest
revisiting my original proposal to calculate a CRC or some other hash
value instead of raw ELF file/program/section headers (as available) that
we need to access anyway (and if cached in BFD, i.e. previously accessed,
then chances are to have them in the buffer cache already).
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Unreliable BFD caching heuristic
2013-11-25 17:45 ` Luis Machado
2013-11-25 18:07 ` Pedro Alves
2013-11-25 18:08 ` Maciej W. Rozycki
@ 2013-11-25 18:21 ` Pedro Alves
2 siblings, 0 replies; 13+ messages in thread
From: Pedro Alves @ 2013-11-25 18:21 UTC (permalink / raw)
To: lgustavo; +Cc: gdb, Tom Tromey, Maciej W. Rozycki
On 11/25/2013 05:45 PM, Luis Machado wrote:
> We can't rely on the timestamp due to some filesystems having 1 second
> or 2 seconds resolution. That doesn't seem enough.
We can't rely on timestamps to be sure the files are the same, but
we can assume that if the files have different timestamps, they're
not the same. IOW, if we have some other fallback file comparison
method that is more expensive, we only need to apply it if the
timestamps match; if they don't match, we can assume the file
is not the same. Sure, if the same file is accessed through
different filesystems, we can end up thinking we had two handles
for different files, but then all that's lost is we don't apply the
bfd sharing optimization in that rare scenario.
--
Pedro Alves
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Unreliable BFD caching heuristic
2013-11-21 17:39 Unreliable BFD caching heuristic Luis Machado
2013-11-21 17:58 ` Pedro Alves
@ 2013-12-02 15:42 ` Tom Tromey
2013-12-03 15:28 ` Luis Machado
1 sibling, 1 reply; 13+ messages in thread
From: Tom Tromey @ 2013-12-02 15:42 UTC (permalink / raw)
To: lgustavo; +Cc: gdb, Maciej W. Rozycki
>>>>> "Luis" == Luis Machado <lgustavo@codesourcery.com> writes:
Luis> If all those things happen at the same time, the BFD machinery will
Luis> attempt to use a cached entry to load data from a totally new
Luis> binary. From that point onwards, things just go downhill.
Luis> This is really a regression compared to older GDB's, and a solution
Luis> probably involves improving the matching heuristics, in
Luis> gdb/gdb_bfd.c:eq_bfd, with more data.
Another idea occurred to me recently, which is that gdb could use
inotify to notice when a file is changed. However this only works on
some limited set of hosts. It's probably overkill as well.
There's actually a second problem here: the BFD file descriptor cache
(see bfd/cache.c). Sometimes gdb closes all the file descriptors in
this cache, and BFD will reopen them without considering whether the
file has changed. So, it is probably possible to construct test cases
that fail even with older versions of gdb.
Even on Linux I think gdb sometimes needs to close the file descriptors
-- e.g., when "set write on" is in effect, "run" needs to close them to
avoid ETXTBSY. I'm not sure how to solve this.
Yet another idea lurking in here is that if a file does change, we
should probably disable trust-readonly-sections for it.
Tom
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Unreliable BFD caching heuristic
2013-12-02 15:42 ` Tom Tromey
@ 2013-12-03 15:28 ` Luis Machado
[not found] ` <CAC=yr6DRDsRLStnDNZW_2=0vOQY-oJHd55_nLUeb8Qetxo=yXw@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Luis Machado @ 2013-12-03 15:28 UTC (permalink / raw)
To: Tom Tromey; +Cc: gdb, Maciej W. Rozycki
On 12/02/2013 01:42 PM, Tom Tromey wrote:
>>>>>> "Luis" == Luis Machado <lgustavo@codesourcery.com> writes:
>
> Luis> If all those things happen at the same time, the BFD machinery will
> Luis> attempt to use a cached entry to load data from a totally new
> Luis> binary. From that point onwards, things just go downhill.
>
> Luis> This is really a regression compared to older GDB's, and a solution
> Luis> probably involves improving the matching heuristics, in
> Luis> gdb/gdb_bfd.c:eq_bfd, with more data.
>
> Another idea occurred to me recently, which is that gdb could use
> inotify to notice when a file is changed. However this only works on
> some limited set of hosts. It's probably overkill as well.
>
> There's actually a second problem here: the BFD file descriptor cache
> (see bfd/cache.c). Sometimes gdb closes all the file descriptors in
> this cache, and BFD will reopen them without considering whether the
> file has changed. So, it is probably possible to construct test cases
> that fail even with older versions of gdb.
It seems to be the case. That will need to be fixed eventually.
>
> Even on Linux I think gdb sometimes needs to close the file descriptors
> -- e.g., when "set write on" is in effect, "run" needs to close them to
> avoid ETXTBSY. I'm not sure how to solve this.
>
> Yet another idea lurking in here is that if a file does change, we
> should probably disable trust-readonly-sections for it.
Right. For now, it seems the ELF headers provide enough information to
tell two different files apart.
The problem there is that we need to open the BFD and read some of its
data, which in turn requires us to detect architecture size and
endianness. Only then we will be able to see if we are dealing with two
different files.
I did an experiment with using the inode number in the cache check. It
seems to work for the hosts that support that information. On Windows i
think we fake inode numbers based on the file name and timestamp, so it
could be a simpler solution.
Luis
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-12-04 15:54 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-21 17:39 Unreliable BFD caching heuristic Luis Machado
2013-11-21 17:58 ` Pedro Alves
2013-11-21 21:58 ` Tom Tromey
2013-11-23 12:14 ` Phi Debian
2013-11-25 17:45 ` Luis Machado
2013-11-25 18:07 ` Pedro Alves
2013-11-25 18:08 ` Maciej W. Rozycki
2013-11-25 18:21 ` Pedro Alves
2013-12-02 15:42 ` Tom Tromey
2013-12-03 15:28 ` Luis Machado
[not found] ` <CAC=yr6DRDsRLStnDNZW_2=0vOQY-oJHd55_nLUeb8Qetxo=yXw@mail.gmail.com>
2013-12-04 0:01 ` Luis Machado
2013-12-04 3:47 ` Eli Zaretskii
2013-12-04 15:54 ` Tom Tromey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox