Mirror of the gdb mailing list
 help / color / mirror / Atom feed
From: Luis Machado <lgustavo@codesourcery.com>
To: "gdb@sourceware.org" <gdb@sourceware.org>,
	 Tom Tromey <tromey@redhat.com>,
	"Maciej W. Rozycki" <macro@codesourcery.com>
Subject: Unreliable BFD caching heuristic
Date: Thu, 21 Nov 2013 17:39:00 -0000	[thread overview]
Message-ID: <528E454F.6060003@codesourcery.com> (raw)

Hi,

Just recently i was chasing a bug in an automated GCC testing 
environment that uses GDB to run the programs GCC builds.

Some tests in GCC's testsuite create a number of different binaries with 
the same name, one after the other, with different flags.

The bug happens when GDB loads all of those binaries, one at a time 
obviously, in a single session, with a pattern of loading, running, 
loading, running...

That's when the BFD caching takes effect and starts trying to save 
resources based on two pieces of information: file name and the file's 
modification timestamp.

In summary, if you are using a reasonably fast build system that can 
create multiple binaries per second, you can potentially end up with two 
or more binaries with the same file modification timestamp.

If all those things happen at the same time, the BFD machinery will 
attempt to use a cached entry to load data from a totally new binary. 
 From that point onwards, things just go downhill.

This is really a regression compared to older GDB's, and a solution 
probably involves improving the matching heuristics, in 
gdb/gdb_bfd.c:eq_bfd, with more data.

Experiments show that using the file size in eq_bfd helps a little, but 
i've managed to trigger the bug with that as well, plus it does not 
address the problem of having multiple BFD's inside an archive.

Since we have hash functions available, Maciej (cc-ed) suggested 
creating a hash number based on ELF header information, which sounds 
like a good approach.

I can also see how this bug would happen under systems with low 
resolution timestamps or with filesystems that do not provide precise 
timestamp information.

Do you have any proposals on ways to improve this heuristic?

As an example, here is a list of binaries GCC generates for a testcase 
(with unique names so we can compare them side-by-side). Notice their 
timestamps.

Size   Timestamp                           Name
107223 2013-11-20 19:53:01.758737300 -0800 20020220-1-10.exe
106727 2013-11-20 19:53:01.758737300 -0800 20020220-1-11.exe
106727 2013-11-20 19:53:01.758737300 -0800 20020220-1-12.exe
107303 2013-11-20 19:53:01.758737300 -0800 20020220-1-13.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-14.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-15.exe
107303 2013-11-20 19:53:02.258781643 -0800 20020220-1-16.exe
106727 2013-11-20 19:53:02.258781643 -0800 20020220-1-17.exe
106727 2013-11-20 19:53:02.757667981 -0800 20020220-1-18.exe
107575 2013-11-20 19:53:02.757667981 -0800 20020220-1-19.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-20.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-21.exe
107655 2013-11-20 19:53:02.757667981 -0800 20020220-1-22.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-23.exe
107079 2013-11-20 19:53:02.757667981 -0800 20020220-1-24.exe
107655 2013-11-20 19:53:03.258763445 -0800 20020220-1-25.exe
107079 2013-11-20 19:53:03.258763445 -0800 20020220-1-26.exe
107079 2013-11-20 19:53:03.258763445 -0800 20020220-1-27.exe
104930 2013-11-20 19:53:01.258799713 -0800 20020220-1-2.exe
104962 2013-11-20 19:53:01.258799713 -0800 20020220-1-3.exe
105890 2013-11-20 19:53:01.258799713 -0800 20020220-1-4.exe
105170 2013-11-20 19:53:01.258799713 -0800 20020220-1-5.exe
105218 2013-11-20 19:53:01.258799713 -0800 20020220-1-6.exe
116934 2013-11-20 19:53:01.758737300 -0800 20020220-1-7.exe
116214 2013-11-20 19:53:01.758737300 -0800 20020220-1-8.exe
116270 2013-11-20 19:53:01.758737300 -0800 20020220-1-9.exe
105378 2013-11-20 19:52:57.758767853 -0800 20020220-1.exe

Regards,
Luis


             reply	other threads:[~2013-11-21 17:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-21 17:39 Luis Machado [this message]
2013-11-21 17:58 ` Pedro Alves
2013-11-21 21:58   ` Tom Tromey
2013-11-23 12:14     ` Phi Debian
2013-11-25 17:45   ` Luis Machado
2013-11-25 18:07     ` Pedro Alves
2013-11-25 18:08     ` Maciej W. Rozycki
2013-11-25 18:21     ` Pedro Alves
2013-12-02 15:42 ` Tom Tromey
2013-12-03 15:28   ` Luis Machado
     [not found]     ` <CAC=yr6DRDsRLStnDNZW_2=0vOQY-oJHd55_nLUeb8Qetxo=yXw@mail.gmail.com>
2013-12-04  0:01       ` Luis Machado
2013-12-04  3:47         ` Eli Zaretskii
2013-12-04 15:54           ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528E454F.6060003@codesourcery.com \
    --to=lgustavo@codesourcery.com \
    --cc=gdb@sourceware.org \
    --cc=macro@codesourcery.com \
    --cc=tromey@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox