RFA: add .gdb_index documentation to gdb.texinfo

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* RFA: add .gdb_index documentation to gdb.texinfo
@ 2011-04-20 13:59 Tom Tromey
  2011-04-20 15:42 ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Tromey @ 2011-04-20 13:59 UTC (permalink / raw)
  To: gdb-patches; +Cc: Mark Wielaard

Mark Wielaard wanted to add code to elfutils to nicely print the
contents of the .gdb_index section.  He asked that we document the
section contents more prominently.

This patch moves the documentation from a comment in dwarf2read.c to the
manual.

Please review.

Tom

2011-04-20  Tom Tromey  <tromey@redhat.com>

	* dwarf2read.c (save_gdb_index_command): Replace format
	documentation with a pointer to the manual.

2011-04-20  Tom Tromey  <tromey@redhat.com>

	* gdb.texinfo (Index Section Format): New node.
	(Top): Add new node to menu.

diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index eefc7d0..a49863b 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -181,6 +181,7 @@ software in general.  We will miss him.
 * Operating System Information:: Getting additional information from
                                  the operating system
 * Trace File Format::		GDB trace file format
+* Index Section Format::        .gdb_index section format
 * Copying::			GNU General Public License says
                                 how you can copy and share GDB
 * GNU Free Documentation License::  The license for this documentation
@@ -36909,6 +36910,126 @@ should contain a comma-separated list of cores that this process
 is running on.  Target may provide additional columns,
 which @value{GDBN} currently ignores.
 
+@node Index Section Format
+@appendix .gdb_index section format
+@cindex .gdb_index section format
+
+The section documents the index section that is created by @code{save
+gdb-index} (@pxref{Index Files}).  The index section is
+DWARF-specific; some knowledge of DWARF is assumed in this
+description.
+
+The mapped index file format is designed to be directly
+@code{mmap}able on any architecture.  In most cases, a datum is
+represented using a little-endian 32-bit integer value, called an
+@code{offset_type}.  Big endian machines must byte-swap the values
+before using them. Exceptions to this rule are noted.  The data is
+laid out such that alignment is always respected.
+
+A mapped index consists of several areas, laid out in order.
+
+@enumerate
+@item
+The file header.  This is a sequence of values, of @code{offset_type}
+unless otherwise noted:
+
+@enumerate
+@item
+The version number, currently 4.  Versions 1, 2 and 3 are obsolete.
+
+@item
+The offset, from the start of the file, of the CU list.
+
+@item
+The offset, from the start of the file, of the types CU list.  Note
+that this area can be empty, in which case this offset will be equal
+to the next offset.
+
+@item
+The offset, from the start of the file, of the address area.
+
+@item
+The offset, from the start of the file, of the symbol table.
+
+@item
+The offset, from the start of the file, of the constant pool.
+@end enumerate
+
+@item
+The CU list.  This is a sequence of pairs of 64-bit little-endian
+values, sorted by the CU offset.  The first element in each pair is
+the offset of a CU in the @code{.debug_info} section.  The second
+element in each pair is the length of that CU.  References to a CU
+elsewhere in the map are done using a CU index, which is just the
+0-based index into this table.  Note that if there are type CUs, then
+conceptually CUs and type CUs form a single list for the purposes of
+CU indices.
+
+@item
+The types CU list.  This is a sequence of triplets of 64-bit
+little-endian values.  In a triplet, the first value is the CU offset,
+the second value is the type offset in the CU, and the third value is
+the type signature.  The types CU list is not sorted.
+
+@item
+The address area.  The address area consists of a sequence of address
+entries.  Each address entry has three elements:
+
+@enumerate
+@item
+The low address.  This is a 64-bit little-endian value.
+
+@item
+The high address.  This is a 64-bit little-endian value.  Like
+@code{DW_AT_high_pc}, the value is one byte beyond the end.
+
+@item
+The CU index.  This is an @code{offset_type} value.
+@end enumerate
+
+@item
+The symbol table.  This is a hash table.  The size of the hash table
+is always a power of 2.
+
+Each slot in the hash table consists of a pair of @code{offset_type}
+values.  The first value is the offset of the symbol's name in the
+constant pool.  The second value is the offset of the CU vector in the
+constant pool.
+
+If both values are 0, then this slot in the hash table is empty.  This
+is ok because while 0 is a valid constant pool index, it cannot be a
+valid index for both a string and a CU vector.
+
+A string in the constant pool is @samp{\0}-terminated.
+
+The hash value for a table entry is computed by an applying an
+iterative hash function to the symbol's name.  Starting with an
+initial value of @code{r = 0}, each (unsigned) character @samp{c} in
+the string is incorporated into the hash using the formula
+@code{r = r * 67 + c - 113}.  The terminating @samp{\0} is not
+incorporated into the hash.
+
+The step size used in the hash table is computed via
+@code{((hash * 17) & (size - 1)) | 1}, where @samp{hash} is the hash
+value, and @samp{size} is the size of the hash table.
+
+The names of C@t{++} symbols in the hash table are canonicalized.  We
+don't currently have a simple description of the canonicalization
+algorithm; if you intend to create new index sections, you must read
+the code.
+
+A CU vector in the constant pool is a sequence of @code{offset_type}
+values.  The first value is the number of CU indices in the vector.
+Each subsequent value is the index of a CU in the CU list.  This
+element in the hash table is used to indicate which CUs define the
+symbol.
+
+@item
+The constant pool.  This is simply a bunch of bytes.  It is organized
+so that alignment is correct: CU vectors are stored first, followed by
+strings.
+@end enumerate
+
 @include gpl.texi
 
 @node GNU Free Documentation License
diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index 032fbd5..a5889ed 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -16005,75 +16005,10 @@ write_psymtabs_to_index (struct objfile *objfile, const char *dir)
   do_cleanups (cleanup);
 }
 
-/* The mapped index file format is designed to be directly mmap()able
-   on any architecture.  In most cases, a datum is represented using a
-   little-endian 32-bit integer value, called an offset_type.  Big
-   endian machines must byte-swap the values before using them.
-   Exceptions to this rule are noted.  The data is laid out such that
-   alignment is always respected.
-
-   A mapped index consists of several sections.
-
-   1. The file header.  This is a sequence of values, of offset_type
-   unless otherwise noted:
-
-   [0] The version number, currently 4.  Versions 1, 2 and 3 are
-   obsolete.
-   [1] The offset, from the start of the file, of the CU list.
-   [2] The offset, from the start of the file, of the types CU list.
-   Note that this section can be empty, in which case this offset will
-   be equal to the next offset.
-   [3] The offset, from the start of the file, of the address section.
-   [4] The offset, from the start of the file, of the symbol table.
-   [5] The offset, from the start of the file, of the constant pool.
-
-   2. The CU list.  This is a sequence of pairs of 64-bit
-   little-endian values, sorted by the CU offset.  The first element
-   in each pair is the offset of a CU in the .debug_info section.  The
-   second element in each pair is the length of that CU.  References
-   to a CU elsewhere in the map are done using a CU index, which is
-   just the 0-based index into this table.  Note that if there are
-   type CUs, then conceptually CUs and type CUs form a single list for
-   the purposes of CU indices.
-
-   3. The types CU list.  This is a sequence of triplets of 64-bit
-   little-endian values.  In a triplet, the first value is the CU
-   offset, the second value is the type offset in the CU, and the
-   third value is the type signature.  The types CU list is not
-   sorted.
-
-   4. The address section.  The address section consists of a sequence
-   of address entries.  Each address entry has three elements.
-   [0] The low address.  This is a 64-bit little-endian value.
-   [1] The high address.  This is a 64-bit little-endian value.
-       Like DW_AT_high_pc, the value is one byte beyond the end.
-   [2] The CU index.  This is an offset_type value.
-
-   5. The symbol table.  This is a hash table.  The size of the hash
-   table is always a power of 2.  The initial hash and the step are
-   currently defined by the `find_slot' function.
-
-   Each slot in the hash table consists of a pair of offset_type
-   values.  The first value is the offset of the symbol's name in the
-   constant pool.  The second value is the offset of the CU vector in
-   the constant pool.
-
-   If both values are 0, then this slot in the hash table is empty.
-   This is ok because while 0 is a valid constant pool index, it
-   cannot be a valid index for both a string and a CU vector.
-
-   A string in the constant pool is stored as a \0-terminated string,
-   as you'd expect.
-
-   A CU vector in the constant pool is a sequence of offset_type
-   values.  The first value is the number of CU indices in the vector.
-   Each subsequent value is the index of a CU in the CU list.  This
-   element in the hash table is used to indicate which CUs define the
-   symbol.
-
-   6. The constant pool.  This is simply a bunch of bytes.  It is
-   organized so that alignment is correct: CU vectors are stored
-   first, followed by strings.  */
+/* Implementation of the `save gdb-index' command.
+   
+   Note that the file format used by this command is documented in the
+   GDB manual.  Any changes here must be documented there.  */
 
 static void
 save_gdb_index_command (char *arg, int from_tty)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFA: add .gdb_index documentation to gdb.texinfo
  2011-04-20 13:59 RFA: add .gdb_index documentation to gdb.texinfo Tom Tromey
@ 2011-04-20 15:42 ` Eli Zaretskii
  2011-04-20 17:19   ` Tom Tromey
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2011-04-20 15:42 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches, mjw

> From: Tom Tromey <tromey@redhat.com>
> CC: Mark Wielaard <mjw@redhat.com>
> Date: Wed, 20 Apr 2011 07:59:22 -0600
> 
> This patch moves the documentation from a comment in dwarf2read.c to the
> manual.

Thanks!

> +@node Index Section Format
> +@appendix .gdb_index section format

".gdb_index" should be in @code in the @appendix line.

> +@cindex .gdb_index section format

I would also add another @cindex entry, "index section format".

> +The section documents the index section that is created by @code{save
   ^^^^^^^^^^^
You probably meant "This section" here.

> +@code{offset_type}.  Big endian machines must byte-swap the values
> +before using them. Exceptions to this rule are noted.  The data is
                    ^^
Two spaces, please.

> +A string in the constant pool is @samp{\0}-terminated.

I think "zero-terminated" is better.

> +The hash value for a table entry is computed by an applying an
                                                   ^^
Remove that extra "an".

> +The step size used in the hash table is computed via
> +@code{((hash * 17) & (size - 1)) | 1}, where @samp{hash} is the hash
> +value, and @samp{size} is the size of the hash table.

It's unclear from the text when is this "step size" used.  Perhaps say
a word or two about that.

> +The names of C@t{++} symbols in the hash table are canonicalized.  We
> +don't currently have a simple description of the canonicalization
> +algorithm; if you intend to create new index sections, you must read
> +the code.
> +
> +A CU vector in the constant pool is a sequence of @code{offset_type}
> +values.  The first value is the number of CU indices in the vector.
> +Each subsequent value is the index of a CU in the CU list.  This
> +element in the hash table is used to indicate which CUs define the
> +symbol.

The text of this @item interleaves information about the hash table
proper with info about the constant pool, which is actually described
in the next @item.  Perhaps it would be good to have all the info
about the constant pool in the next item of the enumerated list?

Thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFA: add .gdb_index documentation to gdb.texinfo
  2011-04-20 15:42 ` Eli Zaretskii
@ 2011-04-20 17:19   ` Tom Tromey
  2011-04-20 17:35     ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Tromey @ 2011-04-20 17:19 UTC (permalink / raw)
  To: gdb-patches

Here is try #2.
I think I addressed all your comments.

Tom

2011-04-20  Tom Tromey  <tromey@redhat.com>

	* dwarf2read.c (save_gdb_index_command): Replace format
	documentation with a pointer to the manual.

2011-04-20  Tom Tromey  <tromey@redhat.com>

	* gdb.texinfo (Index Section Format): New node.
	(Top): Add new node to menu.

diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index eefc7d0..2810e36 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -181,6 +181,7 @@ software in general.  We will miss him.
 * Operating System Information:: Getting additional information from
                                  the operating system
 * Trace File Format::		GDB trace file format
+* Index Section Format::        .gdb_index section format
 * Copying::			GNU General Public License says
                                 how you can copy and share GDB
 * GNU Free Documentation License::  The license for this documentation
@@ -36909,6 +36910,129 @@ should contain a comma-separated list of cores that this process
 is running on.  Target may provide additional columns,
 which @value{GDBN} currently ignores.
 
+@node Index Section Format
+@appendix @code{.gdb_index} section format
+@cindex .gdb_index section format
+@cindex index section format
+
+This section documents the index section that is created by @code{save
+gdb-index} (@pxref{Index Files}).  The index section is
+DWARF-specific; some knowledge of DWARF is assumed in this
+description.
+
+The mapped index file format is designed to be directly
+@code{mmap}able on any architecture.  In most cases, a datum is
+represented using a little-endian 32-bit integer value, called an
+@code{offset_type}.  Big endian machines must byte-swap the values
+before using them.  Exceptions to this rule are noted.  The data is
+laid out such that alignment is always respected.
+
+A mapped index consists of several areas, laid out in order.
+
+@enumerate
+@item
+The file header.  This is a sequence of values, of @code{offset_type}
+unless otherwise noted:
+
+@enumerate
+@item
+The version number, currently 4.  Versions 1, 2 and 3 are obsolete.
+
+@item
+The offset, from the start of the file, of the CU list.
+
+@item
+The offset, from the start of the file, of the types CU list.  Note
+that this area can be empty, in which case this offset will be equal
+to the next offset.
+
+@item
+The offset, from the start of the file, of the address area.
+
+@item
+The offset, from the start of the file, of the symbol table.
+
+@item
+The offset, from the start of the file, of the constant pool.
+@end enumerate
+
+@item
+The CU list.  This is a sequence of pairs of 64-bit little-endian
+values, sorted by the CU offset.  The first element in each pair is
+the offset of a CU in the @code{.debug_info} section.  The second
+element in each pair is the length of that CU.  References to a CU
+elsewhere in the map are done using a CU index, which is just the
+0-based index into this table.  Note that if there are type CUs, then
+conceptually CUs and type CUs form a single list for the purposes of
+CU indices.
+
+@item
+The types CU list.  This is a sequence of triplets of 64-bit
+little-endian values.  In a triplet, the first value is the CU offset,
+the second value is the type offset in the CU, and the third value is
+the type signature.  The types CU list is not sorted.
+
+@item
+The address area.  The address area consists of a sequence of address
+entries.  Each address entry has three elements:
+
+@enumerate
+@item
+The low address.  This is a 64-bit little-endian value.
+
+@item
+The high address.  This is a 64-bit little-endian value.  Like
+@code{DW_AT_high_pc}, the value is one byte beyond the end.
+
+@item
+The CU index.  This is an @code{offset_type} value.
+@end enumerate
+
+@item
+The symbol table.  This is an open-addressed hash table.  The size of
+the hash table is always a power of 2.
+
+Each slot in the hash table consists of a pair of @code{offset_type}
+values.  The first value is the offset of the symbol's name in the
+constant pool.  The second value is the offset of the CU vector in the
+constant pool.
+
+If both values are 0, then this slot in the hash table is empty.  This
+is ok because while 0 is a valid constant pool index, it cannot be a
+valid index for both a string and a CU vector.
+
+A string in the constant pool is zero-terminated.
+
+The hash value for a table entry is computed by applying an
+iterative hash function to the symbol's name.  Starting with an
+initial value of @code{r = 0}, each (unsigned) character @samp{c} in
+the string is incorporated into the hash using the formula
+@code{r = r * 67 + c - 113}.  The terminating @samp{\0} is not
+incorporated into the hash.
+
+The step size used in the hash table is computed via
+@code{((hash * 17) & (size - 1)) | 1}, where @samp{hash} is the hash
+value, and @samp{size} is the size of the hash table.  The step size
+is used to find the next candidate slot when handling a hash
+collision.
+
+The names of C@t{++} symbols in the hash table are canonicalized.  We
+don't currently have a simple description of the canonicalization
+algorithm; if you intend to create new index sections, you must read
+the code.
+
+@item
+The constant pool.  This is simply a bunch of bytes.  It is organized
+so that alignment is correct: CU vectors are stored first, followed by
+strings.
+
+A CU vector in the constant pool is a sequence of @code{offset_type}
+values.  The first value is the number of CU indices in the vector.
+Each subsequent value is the index of a CU in the CU list.  This
+element in the hash table is used to indicate which CUs define the
+symbol.
+@end enumerate
+
 @include gpl.texi
 
 @node GNU Free Documentation License
diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index 032fbd5..a5889ed 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -16005,75 +16005,10 @@ write_psymtabs_to_index (struct objfile *objfile, const char *dir)
   do_cleanups (cleanup);
 }
 
-/* The mapped index file format is designed to be directly mmap()able
-   on any architecture.  In most cases, a datum is represented using a
-   little-endian 32-bit integer value, called an offset_type.  Big
-   endian machines must byte-swap the values before using them.
-   Exceptions to this rule are noted.  The data is laid out such that
-   alignment is always respected.
-
-   A mapped index consists of several sections.
-
-   1. The file header.  This is a sequence of values, of offset_type
-   unless otherwise noted:
-
-   [0] The version number, currently 4.  Versions 1, 2 and 3 are
-   obsolete.
-   [1] The offset, from the start of the file, of the CU list.
-   [2] The offset, from the start of the file, of the types CU list.
-   Note that this section can be empty, in which case this offset will
-   be equal to the next offset.
-   [3] The offset, from the start of the file, of the address section.
-   [4] The offset, from the start of the file, of the symbol table.
-   [5] The offset, from the start of the file, of the constant pool.
-
-   2. The CU list.  This is a sequence of pairs of 64-bit
-   little-endian values, sorted by the CU offset.  The first element
-   in each pair is the offset of a CU in the .debug_info section.  The
-   second element in each pair is the length of that CU.  References
-   to a CU elsewhere in the map are done using a CU index, which is
-   just the 0-based index into this table.  Note that if there are
-   type CUs, then conceptually CUs and type CUs form a single list for
-   the purposes of CU indices.
-
-   3. The types CU list.  This is a sequence of triplets of 64-bit
-   little-endian values.  In a triplet, the first value is the CU
-   offset, the second value is the type offset in the CU, and the
-   third value is the type signature.  The types CU list is not
-   sorted.
-
-   4. The address section.  The address section consists of a sequence
-   of address entries.  Each address entry has three elements.
-   [0] The low address.  This is a 64-bit little-endian value.
-   [1] The high address.  This is a 64-bit little-endian value.
-       Like DW_AT_high_pc, the value is one byte beyond the end.
-   [2] The CU index.  This is an offset_type value.
-
-   5. The symbol table.  This is a hash table.  The size of the hash
-   table is always a power of 2.  The initial hash and the step are
-   currently defined by the `find_slot' function.
-
-   Each slot in the hash table consists of a pair of offset_type
-   values.  The first value is the offset of the symbol's name in the
-   constant pool.  The second value is the offset of the CU vector in
-   the constant pool.
-
-   If both values are 0, then this slot in the hash table is empty.
-   This is ok because while 0 is a valid constant pool index, it
-   cannot be a valid index for both a string and a CU vector.
-
-   A string in the constant pool is stored as a \0-terminated string,
-   as you'd expect.
-
-   A CU vector in the constant pool is a sequence of offset_type
-   values.  The first value is the number of CU indices in the vector.
-   Each subsequent value is the index of a CU in the CU list.  This
-   element in the hash table is used to indicate which CUs define the
-   symbol.
-
-   6. The constant pool.  This is simply a bunch of bytes.  It is
-   organized so that alignment is correct: CU vectors are stored
-   first, followed by strings.  */
+/* Implementation of the `save gdb-index' command.
+   
+   Note that the file format used by this command is documented in the
+   GDB manual.  Any changes here must be documented there.  */
 
 static void
 save_gdb_index_command (char *arg, int from_tty)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFA: add .gdb_index documentation to gdb.texinfo
  2011-04-20 17:19   ` Tom Tromey
@ 2011-04-20 17:35     ` Eli Zaretskii
  0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2011-04-20 17:35 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

> From: Tom Tromey <tromey@redhat.com>
> Date: Wed, 20 Apr 2011 11:18:41 -0600
> 
> Here is try #2.
> I think I addressed all your comments.

Thanks, this is fine.  However, I wonder whether this sentence:

+A string in the constant pool is zero-terminated.

should also be under the "constant pool" item.

But if you think it is better left where it is, fine.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-04-20 17:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-20 13:59 RFA: add .gdb_index documentation to gdb.texinfo Tom Tromey
2011-04-20 15:42 ` Eli Zaretskii
2011-04-20 17:19   ` Tom Tromey
2011-04-20 17:35     ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox