From: Doug Evans <dje@google.com>
To: Eli Zaretskii <eliz@gnu.org>, Tom Tromey <tromey@redhat.com>,
Joel Brobecker <brobecker@adacore.com>,
gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [RFA, doc RFA] Avoid calling gdb_realpath if basenames are different
Date: Tue, 15 Nov 2011 04:46:00 -0000 [thread overview]
Message-ID: <CADPb22TED1ZqEXmAZeNngkJLm256sM35Q0pbXRyttnxMt=o+Tg@mail.gmail.com> (raw)
In-Reply-To: <CADPb22ReomSdvRUCJ=h4BD3KMpWgBsdtHDspXgGNr-iHvKh4aw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6102 bytes --]
On Fri, Nov 11, 2011 at 12:53 AM, Doug Evans <dje@google.com> wrote:
> On Fri, Nov 11, 2011 at 12:47 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>>> Date: Thu, 10 Nov 2011 15:58:46 -0800
>>> From: Doug Evans <dje@google.com>
>>>
>>> 2011-11-10 Doug Evans <dje@google.com>
>>>
>>> * NEWS: Mention new parameter basenames-may-differ.
>>> * dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
>>> ! basenames_may_differ.
>>> * psymtab.c (lookup_partial_symtab): Ditto.
>>> * symtab.c (lookup_symtab): Ditto.
>>> (basenames_may_differ): New global.
>>> (_initialize_symtab): New parameter basenames-may-differ.
>>> * symtab.h (basenames_may_differ): Declare.
>>>
>>> doc/
>>> * gdb.texinfo (Files): Document basenames-may-differ.
>>
>> Thanks.
>>
>>> +set basenames-may-differ
>>> +show basenames-may-differ
>>> + Set whether a source file may have multiple base names.
>>> + A "base name" is the name of a file with the directory part removed.
>>> + Example: The base name of "/home/user/hello.c" is "hello.c".
>>> + When doing file name based lookups, gdb will canonicalize file names
>>> + (e.g., expand symlinks) before comparing them, which is an expensive
>>> + operation.
>>> + If set, gdb will not assume a file is known by one base name, and thus
>>> + it cannot optimize file name comparisions by skipping the canonicalization
>>> + step if the base names are different.
>>> + If not set, all source files must be known by one base name,
>>> + and gdb will do file name comparisons more efficiently.
>>
>> I suggest to rearrange the text, so as to put together the parts that
>> describe what happens when the option is set. Like this:
>>
>> Set whether a source file may have multiple base names.
>> (A "base name" is the name of a file with the directory part removed.
>> Example: The base name of "/home/user/hello.c" is "hello.c".)
>> If set, GDB will canonicalize file names (e.g., expand symlinks)
>> before comparing them. Canonicalization is an expensive operation,
>> but it allows the same file be known by more than one base name.
>> If not set (the default), all source files are assumed to have just
>> one base name, and gdb will do file name comparisons more efficiently.
>>
>> OK?
>>
>>> +When processing file names provided by the user,
>>> +@value{GDBN} will canonicalize them and remove symbolic links.
>>> +This ensures that @value{GDBN} will find the right file,
>>> +even if the debug information specifies an alternate path.
>>> +However, with large programs this canonicalization can noticeably slow
>>> +down @value{GDBN}. To compensate, @value{GDBN} will try to avoid
>>> +this canonicalization wherever possible. One way it can do so
>>> +is by first comparing the @samp{base name} of a file.
>>> +The @samp{base name} of a file is simply the file's name without
>>> +any directory information. For example, the base name of
>>> +@file{/home/user/hello.c} is @file{hello.c}.
>>> +By doing this @value{GDBN} can skip, for example,
>>> +@file{/usr/include/stdio.h} without having to first canonicalize
>>> +and then compare the directory names.
>>> +This works great, except when the base name of a file
>>> +can have multiple names due to symbolic links.
>>> +For example, if @file{/home/user/bar.c} is a symbolic link to
>>> +@file{/home/user/foo.c} then @value{GDBN} cannot just look at
>>> +the base name of two files, it must canonicalize them, expand
>>> +all symbolic links, and @emph{then} compare the file names
>>> +to see if they match.
>>> +Fortunately, having one file known by two different base names
>>> +does not generally occur in practice.
>>> +Should it occur, however, @value{GDBN} provides an escape hatch
>>> +to allow this to work.
>>> +By setting @code{basenames-may-differ} to @code{true}
>>> +@value{GDBN} will always canonicalize file names before
>>> +comparing them, thus ensuring that one file known by multiple
>>> +base names are treated as the same file.
>>
>> This is written as mostly an apology for having this option. That is
>> a wrong angle for describing features in a user manual, because the
>> user generally trusts the developers by default to DTRT. So I would
>> reword it
>>
>> When processing file names provided by the user, @value{GDBN}
>> frequently needs to compare them to the file names recorded in the
>> program's debug info. Normally, @value{GDBN} compares just the
>> @dfn{base names} of the files as strings, which is reasonably fast
>> even for very large programs. (The base name of a file is the last
>> portion of its name, after stripping all the leading directories.)
>> This shortcut in comparison is based upon the assumption that files
>> cannot have more than one base name. This is usually true, but
>> references to files that use symlinks or similar filesystem
>> facilities violate that assumption. If your program records files
>> using such facilities, or if you provide file names to @value{GDBN}
>> using symlinks etc., you can set @code{basenames-may-differ} to
>> @code{true} to instruct @value{GDBN} to completely canonicalize each
>> pair of file names it needs to compare. This will make file-name
>> comparisons accurate, but at a price of a significant slowdown.
>>
>> Do you agree with this wording?
>>
>
> I'm happy if you're happy.
> Thanks for the suggested wording.
Ok to check in?
2011-11-14 Doug Evans <dje@google.com>
* NEWS: Mention new parameter basenames-may-differ.
* dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
! basenames_may_differ.
* psymtab.c (lookup_partial_symtab): Ditto.
* symtab.c (lookup_symtab): Ditto.
(basenames_may_differ): New global.
(_initialize_symtab): New parameter basenames-may-differ.
* symtab.h (basenames_may_differ): Declare.
doc/
* gdb.texinfo (Files): Document basenames-may-differ.
[-- Attachment #2: gdb-111114-basenames-may-differ-3.patch.txt --]
[-- Type: text/plain, Size: 8690 bytes --]
2011-11-14 Doug Evans <dje@google.com>
* NEWS: Mention new parameter basenames-may-differ.
* dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
! basenames_may_differ.
* psymtab.c (lookup_partial_symtab): Ditto.
* symtab.c (lookup_symtab): Ditto.
(basenames_may_differ): New global.
(_initialize_symtab): New parameter basenames-may-differ.
* symtab.h (basenames_may_differ): Declare.
doc/
* gdb.texinfo (Files): Document basenames-may-differ.
Index: NEWS
===================================================================
RCS file: /cvs/src/src/gdb/NEWS,v
retrieving revision 1.466
diff -u -p -r1.466 NEWS
--- NEWS 14 Nov 2011 20:07:20 -0000 1.466
+++ NEWS 15 Nov 2011 04:19:38 -0000
@@ -159,6 +159,17 @@ show debug entry-values
Control display of debugging info for determining frame argument values at
function entry and virtual tail call frames.
+set basenames-may-differ
+show basenames-may-differ
+ Set whether a source file may have multiple base names.
+ (A "base name" is the name of a file with the directory part removed.
+ Example: The base name of "/home/user/hello.c" is "hello.c".)
+ If set, GDB will canonicalize file names (e.g., expand symlinks)
+ before comparing them. Canonicalization is an expensive operation,
+ but it allows the same file be known by more than one base name.
+ If not set (the default), all source files are assumed to have just
+ one base name, and gdb will do file name comparisons more efficiently.
+
* New remote packets
QTEnable
Index: dwarf2read.c
===================================================================
RCS file: /cvs/src/src/gdb/dwarf2read.c,v
retrieving revision 1.580
diff -u -p -r1.580 dwarf2read.c
--- dwarf2read.c 11 Nov 2011 00:43:03 -0000 1.580
+++ dwarf2read.c 15 Nov 2011 04:19:39 -0000
@@ -2445,7 +2445,8 @@ dw2_lookup_symtab (struct objfile *objfi
struct symtab **result)
{
int i;
- int check_basename = lbasename (name) == name;
+ const char *name_basename = lbasename (name);
+ int check_basename = name_basename == name;
struct dwarf2_per_cu_data *base_cu = NULL;
dw2_setup (objfile);
@@ -2478,6 +2479,12 @@ dw2_lookup_symtab (struct objfile *objfi
&& FILENAME_CMP (lbasename (this_name), name) == 0)
base_cu = per_cu;
+ /* Before we invoke realpath, which can get expensive when many
+ files are involved, do a quick comparison of the basenames. */
+ if (! basenames_may_differ
+ && FILENAME_CMP (lbasename (this_name), name_basename) != 0)
+ continue;
+
if (full_path != NULL)
{
const char *this_real_name = dw2_get_real_path (objfile,
Index: psymtab.c
===================================================================
RCS file: /cvs/src/src/gdb/psymtab.c,v
retrieving revision 1.33
diff -u -p -r1.33 psymtab.c
--- psymtab.c 11 Nov 2011 00:43:04 -0000 1.33
+++ psymtab.c 15 Nov 2011 04:19:39 -0000
@@ -134,6 +134,7 @@ lookup_partial_symtab (struct objfile *o
const char *full_path, const char *real_path)
{
struct partial_symtab *pst;
+ const char *name_basename = lbasename (name);
ALL_OBJFILE_PSYMTABS_REQUIRED (objfile, pst)
{
@@ -142,6 +143,12 @@ lookup_partial_symtab (struct objfile *o
return (pst);
}
+ /* Before we invoke realpath, which can get expensive when many
+ files are involved, do a quick comparison of the basenames. */
+ if (! basenames_may_differ
+ && FILENAME_CMP (name_basename, lbasename (pst->filename)) != 0)
+ continue;
+
/* If the user gave us an absolute path, try to find the file in
this symtab and use its absolute path. */
if (full_path != NULL)
@@ -172,7 +179,7 @@ lookup_partial_symtab (struct objfile *o
/* Now, search for a matching tail (only if name doesn't have any dirs). */
- if (lbasename (name) == name)
+ if (name_basename == name)
ALL_OBJFILE_PSYMTABS_REQUIRED (objfile, pst)
{
if (FILENAME_CMP (lbasename (pst->filename), name) == 0)
Index: symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.286
diff -u -p -r1.286 symtab.c
--- symtab.c 11 Nov 2011 00:43:04 -0000 1.286
+++ symtab.c 15 Nov 2011 04:19:39 -0000
@@ -112,6 +112,11 @@ void _initialize_symtab (void);
/* */
+/* Non-zero if a file may be known by two different basenames.
+ This is the uncommon case, and significantly slows down gdb.
+ Default set to "off" to not slow down the common case. */
+int basenames_may_differ = 0;
+
/* Allow the user to configure the debugger behavior with respect
to multiple-choice menus when more than one symbol matches during
a symbol lookup. */
@@ -155,6 +160,7 @@ lookup_symtab (const char *name)
char *real_path = NULL;
char *full_path = NULL;
struct cleanup *cleanup;
+ const char* base_name = lbasename (name);
cleanup = make_cleanup (null_cleanup, NULL);
@@ -180,6 +186,12 @@ got_symtab:
return s;
}
+ /* Before we invoke realpath, which can get expensive when many
+ files are involved, do a quick comparison of the basenames. */
+ if (! basenames_may_differ
+ && FILENAME_CMP (base_name, lbasename (s->filename)) != 0)
+ continue;
+
/* If the user gave us an absolute path, try to find the file in
this symtab and use its absolute path. */
@@ -4885,5 +4897,19 @@ Show how the debugger handles ambiguitie
Valid values are \"ask\", \"all\", \"cancel\", and the default is \"all\"."),
NULL, NULL, &setlist, &showlist);
+ add_setshow_boolean_cmd ("basenames-may-differ", class_obscure,
+ &basenames_may_differ, _("\
+Set whether a source file may have multiple base names."), _("\
+Show whether a source file may have multiple base names."), _("\
+(A \"base name\" is the name of a file with the directory part removed.\n\
+Example: The base name of \"/home/user/hello.c\" is \"hello.c\".)\n\
+If set, GDB will canonicalize file names (e.g., expand symlinks)\n\
+before comparing them. Canonicalization is an expensive operation,\n\
+but it allows the same file be known by more than one base name.\n\
+If not set (the default), all source files are assumed to have just\n\
+one base name, and gdb will do file name comparisons more efficiently."),
+ NULL, NULL,
+ &setlist, &showlist);
+
observer_attach_executable_changed (symtab_observer_executable_changed);
}
Index: symtab.h
===================================================================
RCS file: /cvs/src/src/gdb/symtab.h,v
retrieving revision 1.191
diff -u -p -r1.191 symtab.h
--- symtab.h 10 Nov 2011 20:21:28 -0000 1.191
+++ symtab.h 15 Nov 2011 04:19:39 -0000
@@ -1306,4 +1306,6 @@ void fixup_section (struct general_symbo
struct objfile *lookup_objfile_from_block (const struct block *block);
+extern int basenames_may_differ;
+
#endif /* !defined(SYMTAB_H) */
Index: doc/gdb.texinfo
===================================================================
RCS file: /cvs/src/src/gdb/doc/gdb.texinfo,v
retrieving revision 1.895
diff -u -p -r1.895 gdb.texinfo
--- doc/gdb.texinfo 14 Nov 2011 20:07:23 -0000 1.895
+++ doc/gdb.texinfo 15 Nov 2011 04:19:39 -0000
@@ -15702,6 +15702,33 @@ This is the default.
@end table
@end table
+@cindex file name canonicalization
+@cindex base name differences
+When processing file names provided by the user, @value{GDBN}
+frequently needs to compare them to the file names recorded in the
+program's debug info. Normally, @value{GDBN} compares just the
+@dfn{base names} of the files as strings, which is reasonably fast
+even for very large programs. (The base name of a file is the last
+portion of its name, after stripping all the leading directories.)
+This shortcut in comparison is based upon the assumption that files
+cannot have more than one base name. This is usually true, but
+references to files that use symlinks or similar filesystem
+facilities violate that assumption. If your program records files
+using such facilities, or if you provide file names to @value{GDBN}
+using symlinks etc., you can set @code{basenames-may-differ} to
+@code{true} to instruct @value{GDBN} to completely canonicalize each
+pair of file names it needs to compare. This will make file-name
+comparisons accurate, but at a price of a significant slowdown.
+
+@table @code
+@item set basenames-may-differ
+@kindex set basenames-may-differ
+Set whether a source file may have multiple base names.
+
+@item show basenames-may-differ
+@kindex show basenames-may-differ
+Show whether a source file may have multiple base names.
+@end table
@node Separate Debug Files
@section Debugging Information in Separate Files
next prev parent reply other threads:[~2011-11-15 4:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-06 6:31 [RFC] " Doug Evans
2011-11-06 9:07 ` asmwarrior
2011-11-07 17:06 ` Joel Brobecker
2011-11-08 17:18 ` Tom Tromey
2011-11-11 0:57 ` [RFA, doc RFA] " Doug Evans
2011-11-11 8:49 ` Eli Zaretskii
2011-11-11 9:00 ` Doug Evans
2011-11-15 4:46 ` Doug Evans [this message]
2011-11-15 6:00 ` Eli Zaretskii
2011-11-15 14:23 ` Joel Brobecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADPb22TED1ZqEXmAZeNngkJLm256sM35Q0pbXRyttnxMt=o+Tg@mail.gmail.com' \
--to=dje@google.com \
--cc=brobecker@adacore.com \
--cc=eliz@gnu.org \
--cc=gdb-patches@sourceware.org \
--cc=tromey@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox