Unreliable BFD caching heuristic

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* Unreliable BFD caching heuristic
@ 2013-11-21 17:39 Luis Machado
  2013-11-21 17:58 ` Pedro Alves
  2013-12-02 15:42 ` Tom Tromey
  0 siblings, 2 replies; 13+ messages in thread
From: Luis Machado @ 2013-11-21 17:39 UTC (permalink / raw)
  To: gdb, Tom Tromey, Maciej W. Rozycki

Hi,

Just recently i was chasing a bug in an automated GCC testing 
environment that uses GDB to run the programs GCC builds.

Some tests in GCC's testsuite create a number of different binaries with 
the same name, one after the other, with different flags.

The bug happens when GDB loads all of those binaries, one at a time 
obviously, in a single session, with a pattern of loading, running, 
loading, running...

That's when the BFD caching takes effect and starts trying to save 
resources based on two pieces of information: file name and the file's 
modification timestamp.

In summary, if you are using a reasonably fast build system that can 
create multiple binaries per second, you can potentially end up with two 
or more binaries with the same file modification timestamp.

If all those things happen at the same time, the BFD machinery will 
attempt to use a cached entry to load data from a totally new binary. 
 From that point onwards, things just go downhill.

This is really a regression compared to older GDB's, and a solution 
probably involves improving the matching heuristics, in 
gdb/gdb_bfd.c:eq_bfd, with more data.

Experiments show that using the file size in eq_bfd helps a little, but 
i've managed to trigger the bug with that as well, plus it does not 
address the problem of having multiple BFD's inside an archive.

Since we have hash functions available, Maciej (cc-ed) suggested 
creating a hash number based on ELF header information, which sounds 
like a good approach.

I can also see how this bug would happen under systems with low 
resolution timestamps or with filesystems that do not provide precise 
timestamp information.

Do you have any proposals on ways to improve this heuristic?

As an example, here is a list of binaries GCC generates for a testcase 
(with unique names so we can compare them side-by-side). Notice their 
timestamps.

Size   Timestamp                           Name
107223 2013-11-20 19:53:01.758737300 -0800 20020220-1-10.exe
106727 2013-11-20 19:53:01.758737300 -0800 20020220-1-11.exe
106727 2013-11-20 19:53:01.758737300 -0800 20020220-1-12.exe
107303 2013-11-20 19:53:01.758737300 -0800 20020220-1-13.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-14.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-15.exe
107303 2013-11-20 19:53:02.258781643 -0800 20020220-1-16.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-17.exe
106727 2013-11-20 19:53:02.757667981 -0800 20020220-1-18.exe
107575 2013-11-20 19:53:02.757667981 -0800 20020220-1-19.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-20.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-21.exe
107655 2013-11-20 19:53:02.757667981 -0800 20020220-1-22.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-23.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-24.exe
107655 2013-11-20 19:53:03.258763445 -0800 20020220-1-25.exe
107079 2013-11-20 19:53:03.258763445 -0800 20020220-1-26.exe
107079 2013-11-20 19:53:03.258763445 -0800 20020220-1-27.exe
104930 2013-11-20 19:53:01.258799713 -0800 20020220-1-2.exe
104962 2013-11-20 19:53:01.258799713 -0800 20020220-1-3.exe
105890 2013-11-20 19:53:01.258799713 -0800 20020220-1-4.exe
105170 2013-11-20 19:53:01.258799713 -0800 20020220-1-5.exe
105218 2013-11-20 19:53:01.258799713 -0800 20020220-1-6.exe
116934 2013-11-20 19:53:01.758737300 -0800 20020220-1-7.exe
116214 2013-11-20 19:53:01.758737300 -0800 20020220-1-8.exe
116270 2013-11-20 19:53:01.758737300 -0800 20020220-1-9.exe
105378 2013-11-20 19:52:57.758767853 -0800 20020220-1.exe

Regards,
Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-21 17:39 Unreliable BFD caching heuristic Luis Machado
@ 2013-11-21 17:58 ` Pedro Alves
  2013-11-21 21:58   ` Tom Tromey
  2013-11-25 17:45   ` Luis Machado
  2013-12-02 15:42 ` Tom Tromey
  1 sibling, 2 replies; 13+ messages in thread
From: Pedro Alves @ 2013-11-21 17:58 UTC (permalink / raw)
  To: lgustavo; +Cc: gdb, Tom Tromey, Maciej W. Rozycki

On 11/21/2013 05:39 PM, Luis Machado wrote:
> Do you have any proposals on ways to improve this heuristic?

Compare st_ino/st_dev, and don't share if the system doesn't
provide meaningful bfd_stat data?

symfile.c:separate_debug_file_exists does this already,
and then does a CRC check if all else fails.  Not sure
whether the CRC part would be a good idea here.

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-21 17:58 ` Pedro Alves
@ 2013-11-21 21:58   ` Tom Tromey
  2013-11-23 12:14     ` Phi Debian
  2013-11-25 17:45   ` Luis Machado
  1 sibling, 1 reply; 13+ messages in thread
From: Tom Tromey @ 2013-11-21 21:58 UTC (permalink / raw)
  To: Pedro Alves; +Cc: lgustavo, gdb, Maciej W. Rozycki

>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:

>> Do you have any proposals on ways to improve this heuristic?

Pedro> Compare st_ino/st_dev, and don't share if the system doesn't
Pedro> provide meaningful bfd_stat data?

Pedro> symfile.c:separate_debug_file_exists does this already,
Pedro> and then does a CRC check if all else fails.  Not sure
Pedro> whether the CRC part would be a good idea here.

Using the nanosecond information given by stat would help, at least on
platforms where this exists.

Tom

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-21 21:58   ` Tom Tromey
@ 2013-11-23 12:14     ` Phi Debian
  0 siblings, 0 replies; 13+ messages in thread
From: Phi Debian @ 2013-11-23 12:14 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Pedro Alves, lgustavo, gdb, Maciej W. Rozycki

Does a.out have a link stamp ? could be another discriminant.
I though time/size/ino would have be enough though.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-21 17:58 ` Pedro Alves
  2013-11-21 21:58   ` Tom Tromey
@ 2013-11-25 17:45   ` Luis Machado
  2013-11-25 18:07     ` Pedro Alves
                       ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Luis Machado @ 2013-11-25 17:45 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb, Tom Tromey, Maciej W. Rozycki

On 11/21/2013 03:58 PM, Pedro Alves wrote:
> On 11/21/2013 05:39 PM, Luis Machado wrote:
>> Do you have any proposals on ways to improve this heuristic?
>
> Compare st_ino/st_dev, and don't share if the system doesn't
> provide meaningful bfd_stat data?
>
> symfile.c:separate_debug_file_exists does this already,
> and then does a CRC check if all else fails.  Not sure
> whether the CRC part would be a good idea here.
>

I don't think the inode and device information are portable enough for 
us to use.

The file CRC seems more appropriate in terms of portability, but we need 
to open the bfd, check the CRC and (maybe) close it if we find a cached 
entry. Sounds like a potential performance drawback, but it is more 
reliable IMO.

We can't rely on the timestamp due to some filesystems having 1 second 
or 2 seconds resolution. That doesn't seem enough.

Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-25 17:45   ` Luis Machado
@ 2013-11-25 18:07     ` Pedro Alves
  2013-11-25 18:08     ` Maciej W. Rozycki
  2013-11-25 18:21     ` Pedro Alves
  2 siblings, 0 replies; 13+ messages in thread
From: Pedro Alves @ 2013-11-25 18:07 UTC (permalink / raw)
  To: lgustavo; +Cc: gdb, Tom Tromey, Maciej W. Rozycki

On 11/25/2013 05:45 PM, Luis Machado wrote:
> On 11/21/2013 03:58 PM, Pedro Alves wrote:
>> On 11/21/2013 05:39 PM, Luis Machado wrote:
>>> Do you have any proposals on ways to improve this heuristic?
>>
>> Compare st_ino/st_dev, and don't share if the system doesn't
>> provide meaningful bfd_stat data?
>>
>> symfile.c:separate_debug_file_exists does this already,
>> and then does a CRC check if all else fails.  Not sure
>> whether the CRC part would be a good idea here.
>>
> 
> I don't think the inode and device information are portable enough for 
> us to use.

bfd sharing is a memory optimization.  That's why I suggested
to not share if not possible to tell (efficiently and) reliably.
Comparing st_ino/st_dev between two files is the standard check
to use to check whether two files are the same.  I don't think we
should penalize systems/filesystems that _can_ do this check in the name
of portability.  Only if that fails would we defer to possibly
more expensive fallbacks, like separate_debug_file_exists.

> 
> The file CRC seems more appropriate in terms of portability, but we need 
> to open the bfd, check the CRC and (maybe) close it if we find a cached 
> entry. Sounds like a potential performance drawback, but it is more 
> reliable IMO.

Right, that's why I wasn't sure it'd be a good idea.  Some sort of
fingerprinting like Maciej suggested has potential for being cheaper.
As usual when considering optimizations, it's a code/memory balance.
We can either always optimize for memory and maybe be slow
sometimes, or always optimize for speed, and use more memory
sometimes.  The trick is in finding the sweet spot in between.

> We can't rely on the timestamp due to some filesystems having 1 second 
> or 2 seconds resolution. That doesn't seem enough.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-25 17:45   ` Luis Machado
  2013-11-25 18:07     ` Pedro Alves
@ 2013-11-25 18:08     ` Maciej W. Rozycki
  2013-11-25 18:21     ` Pedro Alves
  2 siblings, 0 replies; 13+ messages in thread
From: Maciej W. Rozycki @ 2013-11-25 18:08 UTC (permalink / raw)
  To: Luis Machado; +Cc: Pedro Alves, gdb, Tom Tromey

On Mon, 25 Nov 2013, Luis Machado wrote:

> > Compare st_ino/st_dev, and don't share if the system doesn't
> > provide meaningful bfd_stat data?
> > 
> > symfile.c:separate_debug_file_exists does this already,
> > and then does a CRC check if all else fails.  Not sure
> > whether the CRC part would be a good idea here.
> > 
> 
> I don't think the inode and device information are portable enough for us to
> use.
> 
> The file CRC seems more appropriate in terms of portability, but we need to
> open the bfd, check the CRC and (maybe) close it if we find a cached entry.
> Sounds like a potential performance drawback, but it is more reliable IMO.
> 
> We can't rely on the timestamp due to some filesystems having 1 second or 2
> seconds resolution. That doesn't seem enough.

 The system clock may also have a 1 second resolution only (although these 
are quite rare indeed these days) and a timestamp can be interfered with 
(unless we check ctime too, but then some filesystems don't store it).

 Calculating a CRC of the whole file may be unacceptably slow with large 
executablees and slower targets, and may therefore defeat the very purpose 
of BFD caching.  If we were going down that path, then may I suggest 
revisiting my original proposal to calculate a CRC or some other hash 
value instead of raw ELF file/program/section headers (as available) that 
we need to access anyway (and if cached in BFD, i.e. previously accessed, 
then chances are to have them in the buffer cache already).

  Maciej

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-25 17:45   ` Luis Machado
  2013-11-25 18:07     ` Pedro Alves
  2013-11-25 18:08     ` Maciej W. Rozycki
@ 2013-11-25 18:21     ` Pedro Alves
  2 siblings, 0 replies; 13+ messages in thread
From: Pedro Alves @ 2013-11-25 18:21 UTC (permalink / raw)
  To: lgustavo; +Cc: gdb, Tom Tromey, Maciej W. Rozycki

On 11/25/2013 05:45 PM, Luis Machado wrote:
> We can't rely on the timestamp due to some filesystems having 1 second 
> or 2 seconds resolution. That doesn't seem enough.

We can't rely on timestamps to be sure the files are the same, but
we can assume that if the files have different timestamps, they're
not the same.  IOW, if we have some other fallback file comparison
method that is more expensive, we only need to apply it if the
timestamps match; if they don't match, we can assume the file
is not the same.  Sure, if the same file is accessed through
different filesystems, we can end up thinking we had two handles
for different files, but then all that's lost is we don't apply the
bfd sharing optimization in that rare scenario.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-11-21 17:39 Unreliable BFD caching heuristic Luis Machado
  2013-11-21 17:58 ` Pedro Alves
@ 2013-12-02 15:42 ` Tom Tromey
  2013-12-03 15:28   ` Luis Machado
  1 sibling, 1 reply; 13+ messages in thread
From: Tom Tromey @ 2013-12-02 15:42 UTC (permalink / raw)
  To: lgustavo; +Cc: gdb, Maciej W. Rozycki

>>>>> "Luis" == Luis Machado <lgustavo@codesourcery.com> writes:

Luis> If all those things happen at the same time, the BFD machinery will
Luis> attempt to use a cached entry to load data from a totally new
Luis> binary. From that point onwards, things just go downhill.

Luis> This is really a regression compared to older GDB's, and a solution
Luis> probably involves improving the matching heuristics, in
Luis> gdb/gdb_bfd.c:eq_bfd, with more data.

Another idea occurred to me recently, which is that gdb could use
inotify to notice when a file is changed.  However this only works on
some limited set of hosts.  It's probably overkill as well.

There's actually a second problem here: the BFD file descriptor cache
(see bfd/cache.c).  Sometimes gdb closes all the file descriptors in
this cache, and BFD will reopen them without considering whether the
file has changed.  So, it is probably possible to construct test cases
that fail even with older versions of gdb.

Even on Linux I think gdb sometimes needs to close the file descriptors
-- e.g., when "set write on" is in effect, "run" needs to close them to
avoid ETXTBSY.  I'm not sure how to solve this.

Yet another idea lurking in here is that if a file does change, we
should probably disable trust-readonly-sections for it.

Tom

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-12-02 15:42 ` Tom Tromey
@ 2013-12-03 15:28   ` Luis Machado
       [not found]     ` <CAC=yr6DRDsRLStnDNZW_2=0vOQY-oJHd55_nLUeb8Qetxo=yXw@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Luis Machado @ 2013-12-03 15:28 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb, Maciej W. Rozycki

On 12/02/2013 01:42 PM, Tom Tromey wrote:
>>>>>> "Luis" == Luis Machado <lgustavo@codesourcery.com> writes:
>
> Luis> If all those things happen at the same time, the BFD machinery will
> Luis> attempt to use a cached entry to load data from a totally new
> Luis> binary. From that point onwards, things just go downhill.
>
> Luis> This is really a regression compared to older GDB's, and a solution
> Luis> probably involves improving the matching heuristics, in
> Luis> gdb/gdb_bfd.c:eq_bfd, with more data.
>
> Another idea occurred to me recently, which is that gdb could use
> inotify to notice when a file is changed.  However this only works on
> some limited set of hosts.  It's probably overkill as well.
>
> There's actually a second problem here: the BFD file descriptor cache
> (see bfd/cache.c).  Sometimes gdb closes all the file descriptors in
> this cache, and BFD will reopen them without considering whether the
> file has changed.  So, it is probably possible to construct test cases
> that fail even with older versions of gdb.

It seems to be the case. That will need to be fixed eventually.

>
> Even on Linux I think gdb sometimes needs to close the file descriptors
> -- e.g., when "set write on" is in effect, "run" needs to close them to
> avoid ETXTBSY.  I'm not sure how to solve this.
>
> Yet another idea lurking in here is that if a file does change, we
> should probably disable trust-readonly-sections for it.

Right. For now, it seems the ELF headers provide enough information to 
tell two different files apart.

The problem there is that we need to open the BFD and read some of its 
data, which in turn requires us to detect architecture size and 
endianness. Only then we will be able to see if we are dealing with two 
different files.

I did an experiment with using the inode number in the cache check. It 
seems to work for the hosts that support that information. On Windows i 
think we fake inode numbers based on the file name and timestamp, so it 
could be a simpler solution.

Luis


^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <CAC=yr6DRDsRLStnDNZW_2=0vOQY-oJHd55_nLUeb8Qetxo=yXw@mail.gmail.com>]

* Re: Unreliable BFD caching heuristic
       [not found]     ` <CAC=yr6DRDsRLStnDNZW_2=0vOQY-oJHd55_nLUeb8Qetxo=yXw@mail.gmail.com>
@ 2013-12-04  0:01       ` Luis Machado
  2013-12-04  3:47         ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Luis Machado @ 2013-12-04  0:01 UTC (permalink / raw)
  To: Petr Hluzín; +Cc: Tom Tromey, gdb, Maciej W. Rozycki

On 12/03/2013 09:33 PM, Petr HluzÃn wrote:
>
> On 3 December 2013 16:27, Luis Machado <lgustavo@codesourcery.com
> <mailto:lgustavo@codesourcery.com>> wrote:
>  > I did an experiment with using the inode number in the cache check.
> It seems
>  > to work for the hosts that support that information. On Windows i
> think we
>  > fake inode numbers based on the file name and timestamp, so it could be a
>  > simpler solution.
>  >
>  > Luis
>
> On Windows you can use GetFileInformationByHandle() function, see
> http://msdn.microsoft.com/en-us/library/aa364952.aspx

Thanks, but i don't think we would like to have windows-specific calls 
in such a generic portion of code.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-12-04  0:01       ` Luis Machado
@ 2013-12-04  3:47         ` Eli Zaretskii
  2013-12-04 15:54           ` Tom Tromey
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2013-12-04  3:47 UTC (permalink / raw)
  To: lgustavo; +Cc: petr.hluzin, tromey, gdb, macro

> Date: Tue, 03 Dec 2013 22:00:31 -0200
> From: Luis Machado <lgustavo@codesourcery.com>
> CC: Tom Tromey <tromey@redhat.com>, gdb <gdb@sourceware.org>,  "Maciej W. Rozycki" <macro@codesourcery.com>
> 
> On 12/03/2013 09:33 PM, Petr HluzÃn wrote:
> >
> > On 3 December 2013 16:27, Luis Machado <lgustavo@codesourcery.com
> > <mailto:lgustavo@codesourcery.com>> wrote:
> >  > I did an experiment with using the inode number in the cache check.
> > It seems
> >  > to work for the hosts that support that information. On Windows i
> > think we
> >  > fake inode numbers based on the file name and timestamp, so it could be a
> >  > simpler solution.
> >  >
> >  > Luis
> >
> > On Windows you can use GetFileInformationByHandle() function, see
> > http://msdn.microsoft.com/en-us/library/aa364952.aspx
> 
> Thanks, but i don't think we would like to have windows-specific calls 
> in such a generic portion of code.

I don't understand: if you want to use inode numbers on Posix systems,
which requires a Unix-specific call, then why not use
GetFileInformationByHandle, which gives the Windows equivalent of the
inode?  IOW, if you can have _real_ inodes on Windows, why risk the
fake ones?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Unreliable BFD caching heuristic
  2013-12-04  3:47         ` Eli Zaretskii
@ 2013-12-04 15:54           ` Tom Tromey
  0 siblings, 0 replies; 13+ messages in thread
From: Tom Tromey @ 2013-12-04 15:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lgustavo, petr.hluzin, gdb, macro

>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:

Luis> Thanks, but i don't think we would like to have windows-specific calls 
Luis> in such a generic portion of code.

Eli> I don't understand: if you want to use inode numbers on Posix systems,
Eli> which requires a Unix-specific call, then why not use
Eli> GetFileInformationByHandle, which gives the Windows equivalent of the
Eli> inode?  IOW, if you can have _real_ inodes on Windows, why risk the
Eli> fake ones?

FWIW it seems fine to me as well.
I'm sure one of our several existing portability mechanisms is suitable
for this task.

Tom

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-12-04 15:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-21 17:39 Unreliable BFD caching heuristic Luis Machado
2013-11-21 17:58 ` Pedro Alves
2013-11-21 21:58   ` Tom Tromey
2013-11-23 12:14     ` Phi Debian
2013-11-25 17:45   ` Luis Machado
2013-11-25 18:07     ` Pedro Alves
2013-11-25 18:08     ` Maciej W. Rozycki
2013-11-25 18:21     ` Pedro Alves
2013-12-02 15:42 ` Tom Tromey
2013-12-03 15:28   ` Luis Machado
     [not found]     ` <CAC=yr6DRDsRLStnDNZW_2=0vOQY-oJHd55_nLUeb8Qetxo=yXw@mail.gmail.com>
2013-12-04  0:01       ` Luis Machado
2013-12-04  3:47         ` Eli Zaretskii
2013-12-04 15:54           ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox