Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Doug Evans <dje@google.com>
To: Eli Zaretskii <eliz@gnu.org>, Tom Tromey <tromey@redhat.com>,
		Joel Brobecker <brobecker@adacore.com>,
	gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [RFA, doc RFA] Avoid calling gdb_realpath if basenames are different
Date: Tue, 15 Nov 2011 04:46:00 -0000	[thread overview]
Message-ID: <CADPb22TED1ZqEXmAZeNngkJLm256sM35Q0pbXRyttnxMt=o+Tg@mail.gmail.com> (raw)
In-Reply-To: <CADPb22ReomSdvRUCJ=h4BD3KMpWgBsdtHDspXgGNr-iHvKh4aw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6102 bytes --]

On Fri, Nov 11, 2011 at 12:53 AM, Doug Evans <dje@google.com> wrote:
> On Fri, Nov 11, 2011 at 12:47 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>>> Date: Thu, 10 Nov 2011 15:58:46 -0800
>>> From: Doug Evans <dje@google.com>
>>>
>>> 2011-11-10  Doug Evans  <dje@google.com>
>>>
>>>         * NEWS: Mention new parameter basenames-may-differ.
>>>         * dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
>>>         ! basenames_may_differ.
>>>         * psymtab.c (lookup_partial_symtab): Ditto.
>>>         * symtab.c (lookup_symtab): Ditto.
>>>         (basenames_may_differ): New global.
>>>         (_initialize_symtab): New parameter basenames-may-differ.
>>>         * symtab.h (basenames_may_differ): Declare.
>>>
>>>         doc/
>>>         * gdb.texinfo (Files): Document basenames-may-differ.
>>
>> Thanks.
>>
>>> +set basenames-may-differ
>>> +show basenames-may-differ
>>> +  Set whether a source file may have multiple base names.
>>> +  A "base name" is the name of a file with the directory part removed.
>>> +  Example: The base name of "/home/user/hello.c" is "hello.c".
>>> +  When doing file name based lookups, gdb will canonicalize file names
>>> +  (e.g., expand symlinks) before comparing them, which is an expensive
>>> +  operation.
>>> +  If set, gdb will not assume a file is known by one base name, and thus
>>> +  it cannot optimize file name comparisions by skipping the canonicalization
>>> +  step if the base names are different.
>>> +  If not set, all source files must be known by one base name,
>>> +  and gdb will do file name comparisons more efficiently.
>>
>> I suggest to rearrange the text, so as to put together the parts that
>> describe what happens when the option is set.  Like this:
>>
>>  Set whether a source file may have multiple base names.
>>  (A "base name" is the name of a file with the directory part removed.
>>  Example: The base name of "/home/user/hello.c" is "hello.c".)
>>  If set, GDB will canonicalize file names (e.g., expand symlinks)
>>  before comparing them.  Canonicalization is an expensive operation,
>>  but it allows the same file be known by more than one base name.
>>  If not set (the default), all source files are assumed to have just
>>  one base name, and gdb will do file name comparisons more efficiently.
>>
>> OK?
>>
>>> +When processing file names provided by the user,
>>> +@value{GDBN} will canonicalize them and remove symbolic links.
>>> +This ensures that @value{GDBN} will find the right file,
>>> +even if the debug information specifies an alternate path.
>>> +However, with large programs this canonicalization can noticeably slow
>>> +down @value{GDBN}.  To compensate, @value{GDBN} will try to avoid
>>> +this canonicalization wherever possible.  One way it can do so
>>> +is by first comparing the @samp{base name} of a file.
>>> +The @samp{base name} of a file is simply the file's name without
>>> +any directory information.  For example, the base name of
>>> +@file{/home/user/hello.c} is @file{hello.c}.
>>> +By doing this @value{GDBN} can skip, for example,
>>> +@file{/usr/include/stdio.h} without having to first canonicalize
>>> +and then compare the directory names.
>>> +This works great, except when the base name of a file
>>> +can have multiple names due to symbolic links.
>>> +For example, if @file{/home/user/bar.c} is a symbolic link to
>>> +@file{/home/user/foo.c} then @value{GDBN} cannot just look at
>>> +the base name of two files, it must canonicalize them, expand
>>> +all symbolic links, and @emph{then} compare the file names
>>> +to see if they match.
>>> +Fortunately, having one file known by two different base names
>>> +does not generally occur in practice.
>>> +Should it occur, however, @value{GDBN} provides an escape hatch
>>> +to allow this to work.
>>> +By setting @code{basenames-may-differ} to @code{true}
>>> +@value{GDBN} will always canonicalize file names before
>>> +comparing them, thus ensuring that one file known by multiple
>>> +base names are treated as the same file.
>>
>> This is written as mostly an apology for having this option.  That is
>> a wrong angle for describing features in a user manual, because the
>> user generally trusts the developers by default to DTRT.  So I would
>> reword it
>>
>>  When processing file names provided by the user, @value{GDBN}
>>  frequently needs to compare them to the file names recorded in the
>>  program's debug info.  Normally, @value{GDBN} compares just the
>>  @dfn{base names} of the files as strings, which is reasonably fast
>>  even for very large programs.  (The base name of a file is the last
>>  portion of its name, after stripping all the leading directories.)
>>  This shortcut in comparison is based upon the assumption that files
>>  cannot have more than one base name.  This is usually true, but
>>  references to files that use symlinks or similar filesystem
>>  facilities violate that assumption.  If your program records files
>>  using such facilities, or if you provide file names to @value{GDBN}
>>  using symlinks etc., you can set @code{basenames-may-differ} to
>>  @code{true} to instruct @value{GDBN} to completely canonicalize each
>>  pair of file names it needs to compare.  This will make file-name
>>  comparisons accurate, but at a price of a significant slowdown.
>>
>> Do you agree with this wording?
>>
>
> I'm happy if you're happy.
> Thanks for the suggested wording.

Ok to check in?

2011-11-14  Doug Evans  <dje@google.com>

        * NEWS: Mention new parameter basenames-may-differ.
        * dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
        ! basenames_may_differ.
        * psymtab.c (lookup_partial_symtab): Ditto.
        * symtab.c (lookup_symtab): Ditto.
        (basenames_may_differ): New global.
        (_initialize_symtab): New parameter basenames-may-differ.
        * symtab.h (basenames_may_differ): Declare.

        doc/
        * gdb.texinfo (Files): Document basenames-may-differ.

[-- Attachment #2: gdb-111114-basenames-may-differ-3.patch.txt --]
[-- Type: text/plain, Size: 8690 bytes --]

2011-11-14  Doug Evans  <dje@google.com>

	* NEWS: Mention new parameter basenames-may-differ.
	* dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
	! basenames_may_differ.
	* psymtab.c (lookup_partial_symtab): Ditto.
	* symtab.c (lookup_symtab): Ditto.
	(basenames_may_differ): New global.
	(_initialize_symtab): New parameter basenames-may-differ.
	* symtab.h (basenames_may_differ): Declare.

	doc/
	* gdb.texinfo (Files): Document basenames-may-differ.

Index: NEWS
===================================================================
RCS file: /cvs/src/src/gdb/NEWS,v
retrieving revision 1.466
diff -u -p -r1.466 NEWS
--- NEWS	14 Nov 2011 20:07:20 -0000	1.466
+++ NEWS	15 Nov 2011 04:19:38 -0000
@@ -159,6 +159,17 @@ show debug entry-values
   Control display of debugging info for determining frame argument values at
   function entry and virtual tail call frames.
 
+set basenames-may-differ
+show basenames-may-differ
+  Set whether a source file may have multiple base names.
+  (A "base name" is the name of a file with the directory part removed.
+  Example: The base name of "/home/user/hello.c" is "hello.c".)
+  If set, GDB will canonicalize file names (e.g., expand symlinks)
+  before comparing them.  Canonicalization is an expensive operation,
+  but it allows the same file be known by more than one base name.
+  If not set (the default), all source files are assumed to have just
+  one base name, and gdb will do file name comparisons more efficiently.
+
 * New remote packets
 
 QTEnable
Index: dwarf2read.c
===================================================================
RCS file: /cvs/src/src/gdb/dwarf2read.c,v
retrieving revision 1.580
diff -u -p -r1.580 dwarf2read.c
--- dwarf2read.c	11 Nov 2011 00:43:03 -0000	1.580
+++ dwarf2read.c	15 Nov 2011 04:19:39 -0000
@@ -2445,7 +2445,8 @@ dw2_lookup_symtab (struct objfile *objfi
 		   struct symtab **result)
 {
   int i;
-  int check_basename = lbasename (name) == name;
+  const char *name_basename = lbasename (name);
+  int check_basename = name_basename == name;
   struct dwarf2_per_cu_data *base_cu = NULL;
 
   dw2_setup (objfile);
@@ -2478,6 +2479,12 @@ dw2_lookup_symtab (struct objfile *objfi
 	      && FILENAME_CMP (lbasename (this_name), name) == 0)
 	    base_cu = per_cu;
 
+	  /* Before we invoke realpath, which can get expensive when many
+	     files are involved, do a quick comparison of the basenames.  */
+	  if (! basenames_may_differ
+	      && FILENAME_CMP (lbasename (this_name), name_basename) != 0)
+	    continue;
+
 	  if (full_path != NULL)
 	    {
 	      const char *this_real_name = dw2_get_real_path (objfile,
Index: psymtab.c
===================================================================
RCS file: /cvs/src/src/gdb/psymtab.c,v
retrieving revision 1.33
diff -u -p -r1.33 psymtab.c
--- psymtab.c	11 Nov 2011 00:43:04 -0000	1.33
+++ psymtab.c	15 Nov 2011 04:19:39 -0000
@@ -134,6 +134,7 @@ lookup_partial_symtab (struct objfile *o
 		       const char *full_path, const char *real_path)
 {
   struct partial_symtab *pst;
+  const char *name_basename = lbasename (name);
 
   ALL_OBJFILE_PSYMTABS_REQUIRED (objfile, pst)
   {
@@ -142,6 +143,12 @@ lookup_partial_symtab (struct objfile *o
 	return (pst);
       }
 
+    /* Before we invoke realpath, which can get expensive when many
+       files are involved, do a quick comparison of the basenames.  */
+    if (! basenames_may_differ
+	&& FILENAME_CMP (name_basename, lbasename (pst->filename)) != 0)
+      continue;
+
     /* If the user gave us an absolute path, try to find the file in
        this symtab and use its absolute path.  */
     if (full_path != NULL)
@@ -172,7 +179,7 @@ lookup_partial_symtab (struct objfile *o
 
   /* Now, search for a matching tail (only if name doesn't have any dirs).  */
 
-  if (lbasename (name) == name)
+  if (name_basename == name)
     ALL_OBJFILE_PSYMTABS_REQUIRED (objfile, pst)
     {
       if (FILENAME_CMP (lbasename (pst->filename), name) == 0)
Index: symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.286
diff -u -p -r1.286 symtab.c
--- symtab.c	11 Nov 2011 00:43:04 -0000	1.286
+++ symtab.c	15 Nov 2011 04:19:39 -0000
@@ -112,6 +112,11 @@ void _initialize_symtab (void);
 
 /* */
 
+/* Non-zero if a file may be known by two different basenames.
+   This is the uncommon case, and significantly slows down gdb.
+   Default set to "off" to not slow down the common case.  */
+int basenames_may_differ = 0;
+
 /* Allow the user to configure the debugger behavior with respect
    to multiple-choice menus when more than one symbol matches during
    a symbol lookup.  */
@@ -155,6 +160,7 @@ lookup_symtab (const char *name)
   char *real_path = NULL;
   char *full_path = NULL;
   struct cleanup *cleanup;
+  const char* base_name = lbasename (name);
 
   cleanup = make_cleanup (null_cleanup, NULL);
 
@@ -180,6 +186,12 @@ got_symtab:
 	return s;
       }
 
+    /* Before we invoke realpath, which can get expensive when many
+       files are involved, do a quick comparison of the basenames.  */
+    if (! basenames_may_differ
+	&& FILENAME_CMP (base_name, lbasename (s->filename)) != 0)
+      continue;
+
     /* If the user gave us an absolute path, try to find the file in
        this symtab and use its absolute path.  */
 
@@ -4885,5 +4897,19 @@ Show how the debugger handles ambiguitie
 Valid values are \"ask\", \"all\", \"cancel\", and the default is \"all\"."),
                         NULL, NULL, &setlist, &showlist);
 
+  add_setshow_boolean_cmd ("basenames-may-differ", class_obscure,
+			   &basenames_may_differ, _("\
+Set whether a source file may have multiple base names."), _("\
+Show whether a source file may have multiple base names."), _("\
+(A \"base name\" is the name of a file with the directory part removed.\n\
+Example: The base name of \"/home/user/hello.c\" is \"hello.c\".)\n\
+If set, GDB will canonicalize file names (e.g., expand symlinks)\n\
+before comparing them.  Canonicalization is an expensive operation,\n\
+but it allows the same file be known by more than one base name.\n\
+If not set (the default), all source files are assumed to have just\n\
+one base name, and gdb will do file name comparisons more efficiently."),
+			   NULL, NULL,
+			   &setlist, &showlist);
+
   observer_attach_executable_changed (symtab_observer_executable_changed);
 }
Index: symtab.h
===================================================================
RCS file: /cvs/src/src/gdb/symtab.h,v
retrieving revision 1.191
diff -u -p -r1.191 symtab.h
--- symtab.h	10 Nov 2011 20:21:28 -0000	1.191
+++ symtab.h	15 Nov 2011 04:19:39 -0000
@@ -1306,4 +1306,6 @@ void fixup_section (struct general_symbo
 
 struct objfile *lookup_objfile_from_block (const struct block *block);
 
+extern int basenames_may_differ;
+
 #endif /* !defined(SYMTAB_H) */
Index: doc/gdb.texinfo
===================================================================
RCS file: /cvs/src/src/gdb/doc/gdb.texinfo,v
retrieving revision 1.895
diff -u -p -r1.895 gdb.texinfo
--- doc/gdb.texinfo	14 Nov 2011 20:07:23 -0000	1.895
+++ doc/gdb.texinfo	15 Nov 2011 04:19:39 -0000
@@ -15702,6 +15702,33 @@ This is the default.
 @end table
 @end table
 
+@cindex file name canonicalization
+@cindex base name differences
+When processing file names provided by the user, @value{GDBN}
+frequently needs to compare them to the file names recorded in the
+program's debug info.  Normally, @value{GDBN} compares just the
+@dfn{base names} of the files as strings, which is reasonably fast
+even for very large programs.  (The base name of a file is the last
+portion of its name, after stripping all the leading directories.)
+This shortcut in comparison is based upon the assumption that files
+cannot have more than one base name.  This is usually true, but
+references to files that use symlinks or similar filesystem
+facilities violate that assumption.  If your program records files
+using such facilities, or if you provide file names to @value{GDBN}
+using symlinks etc., you can set @code{basenames-may-differ} to
+@code{true} to instruct @value{GDBN} to completely canonicalize each
+pair of file names it needs to compare.  This will make file-name
+comparisons accurate, but at a price of a significant slowdown.
+
+@table @code
+@item set basenames-may-differ
+@kindex set basenames-may-differ
+Set whether a source file may have multiple base names.
+
+@item show basenames-may-differ
+@kindex show basenames-may-differ
+Show whether a source file may have multiple base names.
+@end table
 
 @node Separate Debug Files
 @section Debugging Information in Separate Files

  reply	other threads:[~2011-11-15  4:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-06  6:31 [RFC] " Doug Evans
2011-11-06  9:07 ` asmwarrior
2011-11-07 17:06 ` Joel Brobecker
2011-11-08 17:18 ` Tom Tromey
2011-11-11  0:57 ` [RFA, doc RFA] " Doug Evans
2011-11-11  8:49   ` Eli Zaretskii
2011-11-11  9:00     ` Doug Evans
2011-11-15  4:46       ` Doug Evans [this message]
2011-11-15  6:00         ` Eli Zaretskii
2011-11-15 14:23         ` Joel Brobecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADPb22TED1ZqEXmAZeNngkJLm256sM35Q0pbXRyttnxMt=o+Tg@mail.gmail.com' \
    --to=dje@google.com \
    --cc=brobecker@adacore.com \
    --cc=eliz@gnu.org \
    --cc=gdb-patches@sourceware.org \
    --cc=tromey@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox