* RFC: partial symbol table address range generalization
@ 2001-10-23 16:33 Jim Blandy
2001-10-23 16:54 ` Daniel Jacobowitz
2001-10-24 1:52 ` Eli Zaretskii
0 siblings, 2 replies; 6+ messages in thread
From: Jim Blandy @ 2001-10-23 16:33 UTC (permalink / raw)
To: gdb-patches
This is the first patch in a series. If you are interested in GDB's
symbol table, I'd appreciate your comments and review.
The Problem:
At the moment, GDB's partial symbol table structures assume that each
compilation unit's text occupies a single contiguous range of
addresses. If a given address falls between a `struct partial_symtab'
structure's `texthigh' and `textlow' addresses, then GDB assumes that
reading that partial symbol table's debugging info will provide
complete information about that address.
However, this assumption isn't true. A given .o file's code can
actually appear in any number of distinct regions, separated by code
from other .o files. This can happen when we compile and link a C++
program, like the below (from GCC's test suite):
// Special g++ Options: -fexceptions -g
// excess errors test - XFAIL a29k-*-* sparc64-*-elf arm-*-pe
class zeroset {
public:
~zeroset () { }
};
int main () {
zeroset a;
try {
;
} catch( zeroset ) {
}
}
G++ puts the code for `main' in the .o file's .text section, and the
code for the `zeroset' destructor in a section named
.gnu.linkonce.t._ZN7zerosetD1Ev. The linker script gathers all the .o
files' .text sections together, followed by all the .gnu.linkonce.t.*
sections. As a result, library functions' code appears between that
for `main' and the `zeroset' destructor in the final executable. As a
result, GDB builds a `struct partial_symtab' for the above source file
whose `textlow' and `texthigh' range encloses not only the code for
`main' and the destructor, but all the library functions' code as
well.
There is some logic in GDB's lookup functions to cope with overlapping
partial symtabs, and they've been working pretty well on our behalf.
However, they're fragile, and do break in everyday use. For example,
in the executable produced from the source file above, if you try to
set a breakpoint on a library routine compiled without debug
information, GDB will set the breakpoint in `main' instead. (On some
platforms, `_exit' is such a function.)
It is possible to change the linker script to link all a .o file's
sections together, but the linker scripts that are currently in
widespread use do not, and the same situation can arise in other
circumstances (reordered executables, for example).
The Solution:
I'd like to change GDB's `struct partial_symtab' structure to cover an
arbitrary set of addresses, not just a single contiguous span of
addresses. I've written code for a `struct addrset' datatype
(essentially a linked list of ranges) and provided some functions to
make it easy and robust to work with. I'd like to gradually wean the
partial symbol table code away from the textlow/texthigh
representation, and have it use an addrset to represent a partial
symbol table's coverage.
This isn't a complete solution:
- Code that works with the full symbol table still uses each `struct
symtab's global block's `startaddr' and `endaddr' members to find
the `struct symtab' that covers a given address; this suffers from
the same problem as the current partial symtabs.
- Changing the STABS partial symbol table scanner to recognize
non-contiguous .o files seems like it should be possible, but I
wasn't able to do it in a way that I was sure wouldn't break any
targets. So for now, I'm going to leave the STABS reader alone; the
data structures will be capable of describing non-contiguous .o
files on STABS platforms, but the reader will never recognize them.
However, Dwarf 2 does provide the information in a reliable way, and
this change will be useful in that case.
Below is the first patch in the series. It's meant to have no effect
on GDB's behavior.
2001-10-23 Jim Blandy <jimb@redhat.com>
Isolate STABS readers' use of the `textlow' and `texthigh' fields
of `struct partial_symtab' to only a few locations. This change
is not supposed to affect the way the values are computed, only
where they live.
* dbxread.c (struct symloc): Add `textlow' and `texthigh' fields
to the reader-specific structure.
* mdebugread.c (struct symloc): Same.
* dbxread.c (TEXTLOW, TEXTHIGH): New accessor macros.
* mdebugread.c (TEXTLOW, TEXTHIGH): Same.
* dbxread.c (dbx_symfile_read): After we've built all our partial
symbol tables, set each partial symtab's `textlow' and `texthigh'
fields from our reader-specific structure.
* mdebugread.c (mdebug_build_psymtabs): Same.
* dbxread.c (start_psymtab): Initialize the reader-specific
structure's `textlow' and `texthigh' from the new psymtab's.
* mdebugread.c (parse_partial_symbols, new_psymtab): Same.
* dbxread.c (read_dbx_symtab, end_psymtab, read_ofile_symtab): Use
the reader-specific `textlow' and `texthigh', not the generic
psymtab fields.
* mdebugread.c (parse_lines, parse_partial_symbols,
psymtab_to_symtab_1): Same.
* partial-stab.h: Same.
Index: gdb/dbxread.c
===================================================================
RCS file: /cvs/src/src/gdb/dbxread.c,v
retrieving revision 1.24
diff -c -r1.24 dbxread.c
*** gdb/dbxread.c 2001/09/20 03:03:39 1.24
--- gdb/dbxread.c 2001/10/23 23:14:38
***************
*** 76,81 ****
--- 76,87 ----
struct symloc
{
+ /* STABS doesn't reliably give us nice start and end addresses for
+ each function. Instead, we are told the addresses of various
+ boundary points, and we have to gather those together to build
+ ranges. These are our running best guess as to the range of
+ text addresses for this psymtab. */
+ CORE_ADDR textlow, texthigh;
/* Offset within the file symbol table of first local symbol for this
file. */
***************
*** 105,110 ****
--- 111,118 ----
#define LDSYMOFF(p) (((struct symloc *)((p)->read_symtab_private))->ldsymoff)
#define LDSYMLEN(p) (((struct symloc *)((p)->read_symtab_private))->ldsymlen)
#define SYMLOC(p) ((struct symloc *)((p)->read_symtab_private))
+ #define TEXTLOW(p) (SYMLOC(p)->textlow)
+ #define TEXTHIGH(p) (SYMLOC(p)->texthigh)
#define SYMBOL_SIZE(p) (SYMLOC(p)->symbol_size)
#define SYMBOL_OFFSET(p) (SYMLOC(p)->symbol_offset)
#define STRING_OFFSET(p) (SYMLOC(p)->string_offset)
***************
*** 599,604 ****
--- 607,625 ----
read_dbx_dynamic_symtab (objfile);
+ /* Take the text ranges the STABS partial symbol scanner computed
+ for each of the psymtabs and convert it into the canonical form
+ for psymtabs. */
+ {
+ struct partial_symtab *p;
+
+ ALL_OBJFILE_PSYMTABS (objfile, p)
+ {
+ p->textlow = TEXTLOW (p);
+ p->texthigh = TEXTHIGH (p);
+ }
+ }
+
/* Install any minimal symbols that have been collected as the current
minimal symbols for this objfile. */
***************
*** 1341,1347 ****
end_psymtab (pst, psymtab_include_list, includes_used,
symnum * symbol_size,
! text_end > pst->texthigh ? text_end : pst->texthigh,
dependency_list, dependencies_used, textlow_not_set);
}
--- 1362,1368 ----
end_psymtab (pst, psymtab_include_list, includes_used,
symnum * symbol_size,
! text_end > TEXTHIGH (pst) ? text_end : TEXTHIGH (pst),
dependency_list, dependencies_used, textlow_not_set);
}
***************
*** 1367,1372 ****
--- 1388,1395 ----
result->read_symtab_private = (char *)
obstack_alloc (&objfile->psymbol_obstack, sizeof (struct symloc));
+ TEXTLOW (result) = result->textlow;
+ TEXTHIGH (result) = result->texthigh;
LDSYMOFF (result) = ldsymoff;
result->read_symtab = dbx_psymtab_to_symtab;
SYMBOL_SIZE (result) = symbol_size;
***************
*** 1402,1408 ****
if (capping_symbol_offset != -1)
LDSYMLEN (pst) = capping_symbol_offset - LDSYMOFF (pst);
! pst->texthigh = capping_text;
#ifdef SOFUN_ADDRESS_MAYBE_MISSING
/* Under Solaris, the N_SO symbols always have a value of 0,
--- 1425,1431 ----
if (capping_symbol_offset != -1)
LDSYMLEN (pst) = capping_symbol_offset - LDSYMOFF (pst);
! TEXTHIGH (pst) = capping_text;
#ifdef SOFUN_ADDRESS_MAYBE_MISSING
/* Under Solaris, the N_SO symbols always have a value of 0,
***************
*** 1420,1426 ****
a reliable texthigh by taking the address plus size of the
last function in the file. */
! if (pst->texthigh == 0 && last_function_name)
{
char *p;
int n;
--- 1443,1449 ----
a reliable texthigh by taking the address plus size of the
last function in the file. */
! if (TEXTHIGH (pst) == 0 && last_function_name)
{
char *p;
int n;
***************
*** 1446,1459 ****
}
if (minsym)
! pst->texthigh = SYMBOL_VALUE_ADDRESS (minsym) + MSYMBOL_SIZE (minsym);
last_function_name = NULL;
}
/* this test will be true if the last .o file is only data */
if (textlow_not_set)
! pst->textlow = pst->texthigh;
else
{
struct partial_symtab *p1;
--- 1469,1482 ----
}
if (minsym)
! TEXTHIGH (pst) = SYMBOL_VALUE_ADDRESS (minsym) + MSYMBOL_SIZE (minsym);
last_function_name = NULL;
}
/* this test will be true if the last .o file is only data */
if (textlow_not_set)
! TEXTLOW (pst) = TEXTHIGH (pst);
else
{
struct partial_symtab *p1;
***************
*** 1466,1477 ****
ALL_OBJFILE_PSYMTABS (objfile, p1)
{
! if (p1->texthigh == 0 && p1->textlow != 0 && p1 != pst)
{
! p1->texthigh = pst->textlow;
/* if this file has only data, then make textlow match texthigh */
! if (p1->textlow == 0)
! p1->textlow = p1->texthigh;
}
}
}
--- 1489,1500 ----
ALL_OBJFILE_PSYMTABS (objfile, p1)
{
! if (TEXTHIGH (p1) == 0 && TEXTLOW (p1) != 0 && p1 != pst)
{
! TEXTHIGH (p1) = TEXTLOW (pst);
/* if this file has only data, then make textlow match texthigh */
! if (TEXTLOW (p1) == 0)
! TEXTLOW (p1) = TEXTHIGH (p1);
}
}
}
***************
*** 1508,1515 ****
sizeof (struct symloc));
LDSYMOFF (subpst) =
LDSYMLEN (subpst) =
! subpst->textlow =
! subpst->texthigh = 0;
/* We could save slight bits of space by only making one of these,
shared by the entire set of include files. FIXME-someday. */
--- 1531,1538 ----
sizeof (struct symloc));
LDSYMOFF (subpst) =
LDSYMLEN (subpst) =
! TEXTLOW (subpst) =
! TEXTHIGH (subpst) = 0;
/* We could save slight bits of space by only making one of these,
shared by the entire set of include files. FIXME-someday. */
***************
*** 1677,1684 ****
objfile = pst->objfile;
sym_offset = LDSYMOFF (pst);
sym_size = LDSYMLEN (pst);
! text_offset = pst->textlow;
! text_size = pst->texthigh - pst->textlow;
/* This cannot be simply objfile->section_offsets because of
elfstab_offset_sections() which initializes the psymtab section
offsets information in a special way, and that is different from
--- 1700,1707 ----
objfile = pst->objfile;
sym_offset = LDSYMOFF (pst);
sym_size = LDSYMLEN (pst);
! text_offset = TEXTLOW (pst);
! text_size = TEXTHIGH (pst) - TEXTLOW (pst);
/* This cannot be simply objfile->section_offsets because of
elfstab_offset_sections() which initializes the psymtab section
offsets information in a special way, and that is different from
***************
*** 1823,1835 ****
/* In a Solaris elf file, this variable, which comes from the
value of the N_SO symbol, will still be 0. Luckily, text_offset,
! which comes from pst->textlow is correct. */
if (last_source_start_addr == 0)
last_source_start_addr = text_offset;
/* In reordered executables last_source_start_addr may not be the
lower bound for this symtab, instead use text_offset which comes
! from pst->textlow which is correct. */
if (last_source_start_addr > text_offset)
last_source_start_addr = text_offset;
--- 1846,1858 ----
/* In a Solaris elf file, this variable, which comes from the
value of the N_SO symbol, will still be 0. Luckily, text_offset,
! which comes from TEXTLOW (pst) is correct. */
if (last_source_start_addr == 0)
last_source_start_addr = text_offset;
/* In reordered executables last_source_start_addr may not be the
lower bound for this symtab, instead use text_offset which comes
! from TEXTLOW (pst) which is correct. */
if (last_source_start_addr > text_offset)
last_source_start_addr = text_offset;
Index: gdb/mdebugread.c
===================================================================
RCS file: /cvs/src/src/gdb/mdebugread.c,v
retrieving revision 1.16
diff -c -r1.16 mdebugread.c
*** gdb/mdebugread.c 2001/10/12 23:51:28 1.16
--- gdb/mdebugread.c 2001/10/23 23:14:41
***************
*** 105,110 ****
--- 105,115 ----
struct symloc
{
+ /* Our running best guess as to the range of text addresses for
+ this psymtab. After we've read everything in, we use this to
+ build pst->text_addrs. */
+ CORE_ADDR textlow, texthigh;
+
/* Index of the FDR that this psymtab represents. */
int fdr_idx;
/* The BFD that the psymtab was created from. */
***************
*** 120,125 ****
--- 125,132 ----
};
#define PST_PRIVATE(p) ((struct symloc *)(p)->read_symtab_private)
+ #define TEXTLOW(p) (PST_PRIVATE(p)->textlow)
+ #define TEXTHIGH(p) (PST_PRIVATE(p)->texthigh)
#define FDR_IDX(p) (PST_PRIVATE(p)->fdr_idx)
#define CUR_BFD(p) (PST_PRIVATE(p)->cur_bfd)
#define DEBUG_SWAP(p) (PST_PRIVATE(p)->debug_swap)
***************
*** 516,521 ****
--- 523,541 ----
parse_partial_symbols (objfile);
+ /* Take the text ranges the partial symbol scanner computed for each
+ of the psymtabs and convert it into the canonical form for
+ psymtabs. */
+ {
+ struct partial_symtab *p;
+
+ ALL_OBJFILE_PSYMTABS (objfile, p)
+ {
+ p->textlow = TEXTLOW (p);
+ p->texthigh = TEXTHIGH (p);
+ }
+ }
+
#if 0
/* Check to make sure file was compiled with -g. If not, warn the
user of this limitation. */
***************
*** 2174,2180 ****
halt = base + fh->cbLine;
base += pr->cbLineOffset;
! adr = pst->textlow + pr->adr - lowest_pdr_addr;
l = adr >> 2; /* in words */
for (lineno = pr->lnLow; base < halt;)
--- 2194,2200 ----
halt = base + fh->cbLine;
base += pr->cbLineOffset;
! adr = TEXTLOW (pst) + pr->adr - lowest_pdr_addr;
l = adr >> 2; /* in words */
for (lineno = pr->lnLow; base < halt;)
***************
*** 2509,2514 ****
--- 2529,2536 ----
memset ((PTR) pst->read_symtab_private, 0, sizeof (struct symloc));
save_pst = pst;
+ TEXTLOW (pst) = pst->textlow;
+ TEXTHIGH (pst) = pst->texthigh;
FDR_IDX (pst) = f_idx;
CUR_BFD (pst) = cur_bfd;
DEBUG_SWAP (pst) = debug_swap;
***************
*** 2544,2550 ****
psymtab_language = prev_language;
PST_PRIVATE (pst)->pst_language = psymtab_language;
! pst->texthigh = pst->textlow;
/* For stabs-in-ecoff files, the second symbol must be @stab.
This symbol is emitted by mips-tfile to signal that the
--- 2566,2572 ----
psymtab_language = prev_language;
PST_PRIVATE (pst)->pst_language = psymtab_language;
! TEXTHIGH (pst) = TEXTLOW (pst);
/* For stabs-in-ecoff files, the second symbol must be @stab.
This symbol is emitted by mips-tfile to signal that the
***************
*** 2611,2620 ****
/* Kludge for Irix 5.2 zero fh->adr. */
if (!relocatable
! && (pst->textlow == 0 || procaddr < pst->textlow))
! pst->textlow = procaddr;
! if (high > pst->texthigh)
! pst->texthigh = high;
}
}
else if (sh.st == stStatic)
--- 2633,2642 ----
/* Kludge for Irix 5.2 zero fh->adr. */
if (!relocatable
! && (TEXTLOW (pst) == 0 || procaddr < TEXTLOW (pst)))
! TEXTLOW (pst) = procaddr;
! if (high > TEXTHIGH (pst))
! TEXTHIGH (pst) = high;
}
}
else if (sh.st == stStatic)
***************
*** 2703,2709 ****
(pst = save_pst)
#define END_PSYMTAB(pst,ilist,ninc,c_off,c_text,dep_list,n_deps,textlow_not_set) (void)0
#define HANDLE_RBRAC(val) \
! if ((val) > save_pst->texthigh) save_pst->texthigh = (val);
#include "partial-stab.h"
if (stabstring
--- 2725,2731 ----
(pst = save_pst)
#define END_PSYMTAB(pst,ilist,ninc,c_off,c_text,dep_list,n_deps,textlow_not_set) (void)0
#define HANDLE_RBRAC(val) \
! if ((val) > TEXTHIGH (save_pst)) TEXTHIGH (save_pst) = (val);
#include "partial-stab.h"
if (stabstring
***************
*** 2836,2847 ****
/* Kludge for Irix 5.2 zero fh->adr. */
if (!relocatable
! && (pst->textlow == 0 || procaddr < pst->textlow))
! pst->textlow = procaddr;
high = procaddr + sh.value;
! if (high > pst->texthigh)
! pst->texthigh = high;
continue;
case stStatic: /* Variable */
--- 2858,2869 ----
/* Kludge for Irix 5.2 zero fh->adr. */
if (!relocatable
! && (TEXTLOW (pst) == 0 || procaddr < TEXTLOW (pst)))
! TEXTLOW (pst) = procaddr;
high = procaddr + sh.value;
! if (high > TEXTHIGH (pst))
! TEXTHIGH (pst) = high;
continue;
case stStatic: /* Variable */
***************
*** 3015,3030 ****
empty and put on the free list. */
fdr_to_pst[f_idx].pst = end_psymtab (save_pst,
psymtab_include_list, includes_used,
! -1, save_pst->texthigh,
dependency_list, dependencies_used, textlow_not_set);
includes_used = 0;
dependencies_used = 0;
! if (objfile->ei.entry_point >= save_pst->textlow &&
! objfile->ei.entry_point < save_pst->texthigh)
{
! objfile->ei.entry_file_lowpc = save_pst->textlow;
! objfile->ei.entry_file_highpc = save_pst->texthigh;
}
/* The objfile has its functions reordered if this partial symbol
--- 3037,3052 ----
empty and put on the free list. */
fdr_to_pst[f_idx].pst = end_psymtab (save_pst,
psymtab_include_list, includes_used,
! -1, TEXTHIGH (save_pst),
dependency_list, dependencies_used, textlow_not_set);
includes_used = 0;
dependencies_used = 0;
! if (objfile->ei.entry_point >= TEXTLOW (save_pst) &&
! objfile->ei.entry_point < TEXTHIGH (save_pst))
{
! objfile->ei.entry_file_lowpc = TEXTLOW (save_pst);
! objfile->ei.entry_file_highpc = TEXTHIGH (save_pst);
}
/* The objfile has its functions reordered if this partial symbol
***************
*** 3040,3054 ****
other cases. */
save_pst = fdr_to_pst[f_idx].pst;
if (save_pst != NULL
! && save_pst->textlow != 0
&& !(objfile->flags & OBJF_REORDERED))
{
ALL_OBJFILE_PSYMTABS (objfile, pst)
{
if (save_pst != pst
! && save_pst->textlow >= pst->textlow
! && save_pst->textlow < pst->texthigh
! && save_pst->texthigh > pst->texthigh)
{
objfile->flags |= OBJF_REORDERED;
break;
--- 3062,3076 ----
other cases. */
save_pst = fdr_to_pst[f_idx].pst;
if (save_pst != NULL
! && TEXTLOW (save_pst) != 0
&& !(objfile->flags & OBJF_REORDERED))
{
ALL_OBJFILE_PSYMTABS (objfile, pst)
{
if (save_pst != pst
! && TEXTLOW (save_pst) >= TEXTLOW (pst)
! && TEXTLOW (save_pst) < TEXTHIGH (pst)
! && TEXTHIGH (save_pst) > TEXTHIGH (pst))
{
objfile->flags |= OBJF_REORDERED;
break;
***************
*** 3252,3258 ****
/* Do nothing if this is a dummy psymtab. */
if (pst->n_global_syms == 0 && pst->n_static_syms == 0
! && pst->textlow == 0 && pst->texthigh == 0)
return;
/* Now read the symbols for this symtab */
--- 3274,3280 ----
/* Do nothing if this is a dummy psymtab. */
if (pst->n_global_syms == 0 && pst->n_static_syms == 0
! && TEXTLOW (pst) == 0 && TEXTHIGH (pst) == 0)
return;
/* Now read the symbols for this symtab */
***************
*** 3400,3406 ****
if (! last_symtab_ended)
{
! st = end_symtab (pst->texthigh, pst->objfile, SECT_OFF_TEXT (pst->objfile));
end_stabs ();
}
--- 3422,3428 ----
if (! last_symtab_ended)
{
! st = end_symtab (TEXTHIGH (pst), pst->objfile, SECT_OFF_TEXT (pst->objfile));
end_stabs ();
}
***************
*** 3490,3496 ****
top_stack->cur_st = st;
top_stack->cur_block = BLOCKVECTOR_BLOCK (BLOCKVECTOR (st),
STATIC_BLOCK);
! BLOCK_START (top_stack->cur_block) = pst->textlow;
BLOCK_END (top_stack->cur_block) = 0;
top_stack->blocktype = stFile;
top_stack->maxsyms = 2 * f_max;
--- 3512,3518 ----
top_stack->cur_st = st;
top_stack->cur_block = BLOCKVECTOR_BLOCK (BLOCKVECTOR (st),
STATIC_BLOCK);
! BLOCK_START (top_stack->cur_block) = TEXTLOW (pst);
BLOCK_END (top_stack->cur_block) = 0;
top_stack->blocktype = stFile;
top_stack->maxsyms = 2 * f_max;
Index: gdb/partial-stab.h
===================================================================
RCS file: /cvs/src/src/gdb/partial-stab.h,v
retrieving revision 1.13
diff -c -r1.13 partial-stab.h
*** gdb/partial-stab.h 2001/09/06 20:50:48 1.13
--- gdb/partial-stab.h 2001/10/23 23:14:42
***************
*** 104,115 ****
if (past_first_source_file && pst
/* The gould NP1 uses low values for .o and -l symbols
which are not the address. */
! && CUR_SYMBOL_VALUE >= pst->textlow)
{
END_PSYMTAB (pst, psymtab_include_list, includes_used,
symnum * symbol_size,
! CUR_SYMBOL_VALUE > pst->texthigh
! ? CUR_SYMBOL_VALUE : pst->texthigh,
dependency_list, dependencies_used, textlow_not_set);
pst = (struct partial_symtab *) 0;
includes_used = 0;
--- 104,115 ----
if (past_first_source_file && pst
/* The gould NP1 uses low values for .o and -l symbols
which are not the address. */
! && CUR_SYMBOL_VALUE >= TEXTLOW (pst))
{
END_PSYMTAB (pst, psymtab_include_list, includes_used,
symnum * symbol_size,
! CUR_SYMBOL_VALUE > TEXTHIGH (pst)
! ? CUR_SYMBOL_VALUE : TEXTHIGH (pst),
dependency_list, dependencies_used, textlow_not_set);
pst = (struct partial_symtab *) 0;
includes_used = 0;
***************
*** 236,242 ****
{
END_PSYMTAB (pst, psymtab_include_list, includes_used,
symnum * symbol_size,
! valu > pst->texthigh ? valu : pst->texthigh,
dependency_list, dependencies_used,
prev_textlow_not_set);
pst = (struct partial_symtab *) 0;
--- 236,242 ----
{
END_PSYMTAB (pst, psymtab_include_list, includes_used,
symnum * symbol_size,
! valu > TEXTHIGH (pst) ? valu : TEXTHIGH (pst),
dependency_list, dependencies_used,
prev_textlow_not_set);
pst = (struct partial_symtab *) 0;
***************
*** 405,412 ****
function relative stabs, or the address of the function's
end for old style stabs. */
valu = CUR_SYMBOL_VALUE + last_function_start;
! if (pst->texthigh == 0 || valu > pst->texthigh)
! pst->texthigh = valu;
break;
}
#endif
--- 405,412 ----
function relative stabs, or the address of the function's
end for old style stabs. */
valu = CUR_SYMBOL_VALUE + last_function_start;
! if (TEXTHIGH (pst) == 0 || valu > TEXTHIGH (pst))
! TEXTHIGH (pst) = valu;
break;
}
#endif
***************
*** 610,616 ****
}
if (pst && textlow_not_set)
{
! pst->textlow = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif
--- 610,616 ----
}
if (pst && textlow_not_set)
{
! TEXTLOW (pst) = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif
***************
*** 626,637 ****
the partial symbol table. */
if (pst
&& (textlow_not_set
! || (CUR_SYMBOL_VALUE < pst->textlow
&& (CUR_SYMBOL_VALUE
!= ANOFFSET (objfile->section_offsets,
SECT_OFF_TEXT (objfile))))))
{
! pst->textlow = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif /* DBXREAD_ONLY */
--- 626,637 ----
the partial symbol table. */
if (pst
&& (textlow_not_set
! || (CUR_SYMBOL_VALUE < TEXTLOW (pst)
&& (CUR_SYMBOL_VALUE
!= ANOFFSET (objfile->section_offsets,
SECT_OFF_TEXT (objfile))))))
{
! TEXTLOW (pst) = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif /* DBXREAD_ONLY */
***************
*** 677,683 ****
}
if (pst && textlow_not_set)
{
! pst->textlow = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif
--- 677,683 ----
}
if (pst && textlow_not_set)
{
! TEXTLOW (pst) = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif
***************
*** 693,704 ****
the partial symbol table. */
if (pst
&& (textlow_not_set
! || (CUR_SYMBOL_VALUE < pst->textlow
&& (CUR_SYMBOL_VALUE
!= ANOFFSET (objfile->section_offsets,
SECT_OFF_TEXT (objfile))))))
{
! pst->textlow = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif /* DBXREAD_ONLY */
--- 693,704 ----
the partial symbol table. */
if (pst
&& (textlow_not_set
! || (CUR_SYMBOL_VALUE < TEXTLOW (pst)
&& (CUR_SYMBOL_VALUE
!= ANOFFSET (objfile->section_offsets,
SECT_OFF_TEXT (objfile))))))
{
! TEXTLOW (pst) = CUR_SYMBOL_VALUE;
textlow_not_set = 0;
}
#endif /* DBXREAD_ONLY */
***************
*** 813,819 ****
case N_ENDM:
#ifdef SOFUN_ADDRESS_MAYBE_MISSING
/* Solaris 2 end of module, finish current partial symbol table.
! END_PSYMTAB will set pst->texthigh to the proper value, which
is necessary if a module compiled without debugging info
follows this module. */
if (pst)
--- 813,819 ----
case N_ENDM:
#ifdef SOFUN_ADDRESS_MAYBE_MISSING
/* Solaris 2 end of module, finish current partial symbol table.
! END_PSYMTAB will set TEXTHIGH (pst) to the proper value, which
is necessary if a module compiled without debugging info
follows this module. */
if (pst)
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: RFC: partial symbol table address range generalization 2001-10-23 16:33 RFC: partial symbol table address range generalization Jim Blandy @ 2001-10-23 16:54 ` Daniel Jacobowitz [not found] ` <npsnc965cl.fsf@zwingli.cygnus.com> 2001-10-24 1:52 ` Eli Zaretskii 1 sibling, 1 reply; 6+ messages in thread From: Daniel Jacobowitz @ 2001-10-23 16:54 UTC (permalink / raw) To: gdb-patches On Tue, Oct 23, 2001 at 06:34:50PM -0500, Jim Blandy wrote: > At the moment, GDB's partial symbol table structures assume that each > compilation unit's text occupies a single contiguous range of > addresses. If a given address falls between a `struct partial_symtab' > structure's `texthigh' and `textlow' addresses, then GDB assumes that > reading that partial symbol table's debugging info will provide > complete information about that address. > > However, this assumption isn't true. A given .o file's code can > actually appear in any number of distinct regions, separated by code > from other .o files. This can happen when we compile and link a C++ > program, like the below (from GCC's test suite): I'd like to point you at my thoughts from the first time I noticed this behavior: < http://sources.redhat.com/ml/gdb/2001-08/msg00161.html > When I looked at this problem in August, I wasn't brave enough to suggest the solution you're proposing (which I completely agree with). However... > There is some logic in GDB's lookup functions to cope with overlapping > partial symtabs, and they've been working pretty well on our behalf. > However, they're fragile, and do break in everyday use. For example, > in the executable produced from the source file above, if you try to > set a breakpoint on a library routine compiled without debug > information, GDB will set the breakpoint in `main' instead. (On some > platforms, `_exit' is such a function.) This particular problem should be avoidable anyway. I would appreciate it if you would look at: < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html > As far as getting this information from the stabs reader, we should be able to do it if we know separate "possible" and "definite" ranges, I think. I'm not sure if we have enough information to do this. We don't for the general case, but we should generally have one "definite" range per file (corresponding to the main .text segment). It would be nice to at least take this region out of any symtabs that seem to encompass it, so that if the main program is built without -ffunction-sections, we will behave sanely in the presence of (say) libstdc++.a. -- Daniel Jacobowitz Carnegie Mellon University MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <npsnc965cl.fsf@zwingli.cygnus.com>]
* Re: RFC: partial symbol table address range generalization [not found] ` <npsnc965cl.fsf@zwingli.cygnus.com> @ 2001-10-23 21:42 ` Daniel Jacobowitz 2001-10-24 17:08 ` Jim Blandy 0 siblings, 1 reply; 6+ messages in thread From: Daniel Jacobowitz @ 2001-10-23 21:42 UTC (permalink / raw) To: Jim Blandy; +Cc: gdb-patches On Tue, Oct 23, 2001 at 11:15:38PM -0500, Jim Blandy wrote: > > Daniel Jacobowitz <drow@mvista.com> writes: > > I'd like to point you at my thoughts from the first time I noticed this > > behavior: > > < http://sources.redhat.com/ml/gdb/2001-08/msg00161.html > > > That's messed up. The compiler produces SO stabs that describe only > the .text section, while happily placing code in other sections. So > the SO stabs' addresses are useless. > > If you want to hack on the stabs reader to handle these cases, that > would be great. I'm not sure how to, at present. But it may be possible when you're done. Of course, I'd rather transition all the targets I care about to DWARF2 instead of fixing stabs; but if I have time, I'll take a whack at it. > > > There is some logic in GDB's lookup functions to cope with overlapping > > > partial symtabs, and they've been working pretty well on our behalf. > > > However, they're fragile, and do break in everyday use. For example, > > > in the executable produced from the source file above, if you try to > > > set a breakpoint on a library routine compiled without debug > > > information, GDB will set the breakpoint in `main' instead. (On some > > > platforms, `_exit' is such a function.) > > > > This particular problem should be avoidable anyway. I would appreciate > > it if you would look at: > > < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html > > > Well, okay. But I'd much prefer to see the problem fixed by making > the symbol table contents more accurate than by adding another > heuristic for recognizing insane data. When checks like that get into > the code, they never go away, because nobody's ever really sure > whether the circumstances it was meant to cope with happen any more. Well, I wouldn't call it a heuristic in this case. If we don't have debugging info, isn't it a little bit insane to try to find a line number? > If I can finish up the addrset patch for Dwarf 2, would you be > interested in taking a shot at spiffing up stabs? As said, it isn't a priority for me, but it is something that I would like to see cleaned up. Especially when Elena is finished cleaning up the partial-stab.h mess (or has it been finished now? I don't recall). > I encourage you to take a shot at it. The idea is to have `struct > addrset' support a bunch of groovy operations (like set subtraction, > testing for intersections between two sets, etc.), carefully coded and > gotten right once and for all, that make it easy to do this kind of > sanity checking and refinement. Here's the header file for the code > I've got now; imagine adding the setwise ops you want here, and then > using them in the stabs reader. Looks good! The only comment I have is trivial: > /* Set *START and *END to the first and last addresses (inclusive) of > the first contiguous range of addresses in ADDRSET. If ADDRSET is > empty, set *START to 1 and *END to zero. */ > void addrset_first_range (struct addrset *addrset, > CORE_ADDR *start, > CORE_ADDR *end); > > /* Set *START and *END to the first and last addresses (inclusive) of > the first contiguous range of addresses in ADDRSET after AFTER. > (That is, AFTER will not be included in the returned range.) > If there are no addresses in ADDRSET that are > AFTER, then > set *START to 1 and *END to zero. */ > void addrset_next_range (struct addrset *addrset, > CORE_ADDR after, > CORE_ADDR *start, > CORE_ADDR *end); For performance reasons, it might be better to steal the iterator concept and have an opaque cookie separate from the start/end addresses. Otherwise, addrset_next_range is not going to be constant time, and that could get annoying in an objfile with a large number of sections... and C++ is notorious for absurd numbers of sections. -- Daniel Jacobowitz Carnegie Mellon University MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: partial symbol table address range generalization 2001-10-23 21:42 ` Daniel Jacobowitz @ 2001-10-24 17:08 ` Jim Blandy 2001-10-24 17:39 ` Daniel Jacobowitz 0 siblings, 1 reply; 6+ messages in thread From: Jim Blandy @ 2001-10-24 17:08 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz <drow@mvista.com> writes: > > > This particular problem should be avoidable anyway. I would appreciate > > > it if you would look at: > > > < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html > > > > > Well, okay. But I'd much prefer to see the problem fixed by making > > the symbol table contents more accurate than by adding another > > heuristic for recognizing insane data. When checks like that get into > > the code, they never go away, because nobody's ever really sure > > whether the circumstances it was meant to cope with happen any more. > > Well, I wouldn't call it a heuristic in this case. If we don't have > debugging info, isn't it a little bit insane to try to find a line > number? Yes, certainly. But the reasoning I see going on is this: we can't trust our address range data, and our source line search screws up if the address range data isn't accurate, so we'll prop up the process by pulling a name search into the process, too. The more checks we add, the more likely we are to detect that something's screwey. This works, but I'd really rather just make the address range data accurate in the first place. The part that strikes me as a `heuristic' is the assumption that the address ranges are not quite trustworthy, and that therefore we should cross-check them against the function table before we proceed. If you think it's going to be too hard to get reliable address ranges for symtabs, then it'd certainly be better to put Peter's change in; at least something will get fixed. But if it's practical, I think it's better policy to work hard at filling your data structures with reliable data than to sprinkle your query code with sanity checks. > For performance reasons, it might be better to steal the iterator > concept and have an opaque cookie separate from the start/end > addresses. Otherwise, addrset_next_range is not going to be constant > time, and that could get annoying in an objfile with a large number of > sections... and C++ is notorious for absurd numbers of sections. It is (usually) constant time. :) The representation maintains a `current position' in the sorted linked list of ranges, and starts its searches from that point. So in a series of operations where each address is near the last one (like a sequential scan), each operation takes place in constant time. This is somewhat like having a single iterator object hidden inside the addrset object; that is, at any given time, there's only one range of addresses which can be operated on quickly. If instead we had an explicit iterator type, one could do multiple traversals of different areas efficiently. However, mutations can delete nodes from the list, and change their values; it seems to me that keeping iterators valid across mutation operations would add complexity. Anyway, it's an opaque type. If performance is a problem, or if we want to add iterators, we can revamp the whole thing without breaking its users. Here's the implementation: /* addrset.c --- implementation of the address set datatype Copyright 2001 Free Software Foundation, Inc. This file is part of GDB. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include "defs.h" #include "obstack.h" #include "gdb_assert.h" #include "addrset.h" struct addrset { /* Two singly-linked lists of nodes; each node represents a range of addresses in the addrset. None of these nodes' address ranges overlap or abut each other. `before' is sorted in order of decreasing addresses, `after' in order of increasing addresses. All nodes in `before' have lower addresses than all the nodes in `after'. Generally, whenever we operate on the list, we try split it between `before' and `after' so that the division comes directly after the last node we operated on. That way, sequential access happens in constant time. Since we split it *after* the last node we operate on, we're slightly biased towards traversal towards increasing address, but it's not too bad if we want to go the other way. */ struct node *before, *after; /* If this is non-zero, allocate new nodes from this obstack. Otherwise, allocate nodes using xmalloc. */ struct obstack *obstack; /* We can't free nodes allocated from an obstack, so we might as well keep them in a free list, and reuse them. This list can't be global, as different obstacks have different lifetimes. Ideally, it would be per-obstack, so a node could be freed from one addrset and then re-used by another addrset in the same obstack. But we don't have any convenient way to store per-obstack info. So the free list is per-addrset. For addrsets allocated using xmalloc, we simply xfree nodes, and hope malloc manages small blocks well; this field is always zero. */ struct node *free; }; /* One contiguous range of addresses in an addrset. */ struct node { struct node *next; /* The start and end addresses, inclusive, of the address range. Obviously, start <= end. If start == end, that's a one-byte range. */ CORE_ADDR start, end; }; struct addrset * new_addrset (struct obstack *obstack) { struct addrset *new; if (obstack) new = obstack_alloc (obstack, sizeof (*new)); else new = xmalloc (sizeof (*new)); memset (new, 0, sizeof (*new)); new->obstack = obstack; return new; } static void xfree_node_list (struct node *n) { while (n) { struct node *next = n->next; xfree (n); n = next; } } void addrset_free (struct addrset *addrset) { /* You can't free an addrset allocated in an obstack. */ gdb_assert (! addrset->obstack); /* If it wasn't allocated on an obstack, it had better not have a free list. */ gdb_assert (! addrset->free); xfree_node_list (addrset->before); xfree_node_list (addrset->after); } static struct node * new_node (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end) { struct node *new; if (addrset->obstack) { /* If we have any nodes on the free list, use them. */ if (addrset->free) { new = addrset->free; addrset->free = new->next; } else new = obstack_alloc (addrset->obstack, sizeof (*new)); } else new = xmalloc (sizeof (*new)); new->next = 0; new->start = start; new->end = end; return new; } static void free_node (struct addrset *addrset, struct node *node) { if (addrset->obstack) { node->next = addrset->free; addrset->free = node; } else xfree (node); } /* Return true if A is less than B, and there is at least one address between them. */ static int less_and_separate (CORE_ADDR a, CORE_ADDR b) { /* We need both tests; think about what happens, for example, when B is zero and A is (CORE_ADDR) -1. */ return (a < b && b - a >= 2); } /* Shift nodes from `before' to `after', or in the other direction, until the split falls at the right place. By `the right place', we mean: - any nodes on `after' are after, and do not abut, the range from START to END, and - any nodes on `before' are either completely before, abut, or overlap the range from START to END. Once this is done, any nodes on `after' are irrelevant to our operation, and any nodes that abut or overlap our range are at the head of `before'. */ static void resplit (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end) { struct node *node; /* We know the `before' and `after' lists are each sorted in the proper direction, and contain non-overlapping, non-adjacent ranges. The first step is to move all ranges from `after' to `before' that are before, overlap, or abut the range we're operating on. Be careful of nodes abutting either end of the address space. */ while ((node = addrset->after) != 0 && ! less_and_separate (end, node->start)) { addrset->after = node->next; node->next = addrset->before; addrset->before = node; } /* Now, any ranges in `after' are after and do not abut the range we're operating on. This means that all those nodes are irrelevant to this operation, and we can ignore them. Thus, any nodes we care about are somewhere in `before'. Bring them to the head of that list by shifting any nodes from `before' to `after' that are completely after the range we're operating on. The astute reader will note that the condition for moving a node in this loop is the complement of that for the previous loop. That is, the condition for moving a node in one direction is the opposite of moving it in the other direction. This means that only one of these two loops' bodies will ever be executed in a given call. */ while ((node = addrset->before) != 0 && less_and_separate (end, node->start)) { addrset->before = node->next; node->next = addrset->after; addrset->after = node; } } void addrset_add (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end) { struct node *node, *save; gdb_assert (start <= end); resplit (addrset, start, end); /* Now we know that any ranges in `after' are irrelevant to this operation, and that zero or more nodes at the head of `before' abut or overlap the range we're adding. Remove any such nodes from `before', and absorb their ranges into ours. */ save = 0; while ((node = addrset->before) != 0 && ! less_and_separate (node->end, start)) { if (node->start < start) start = node->start; if (node->end > end) end = node->end; addrset->before = node->next; /* We want to remove this node from the list, but we know we're going to insert a new one in a bit, so hold onto the first node we delete, just to save a call to xfree and xmalloc. */ if (save) free_node (addrset, node); else save = node; } /* Create a node for the new range. */ if (save) { save->start = start; save->end = end; } else save = new_node (addrset, start, end); /* Stick it on the head of the `before' list. */ save->next = addrset->before; addrset->before = save; } void addrset_remove (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end) { struct node *node; gdb_assert (start <= end); /* This could end up moving a bunch of nodes to `before' that we're just going to delete anyway. But our running time is linear in the number of ranges we're going to delete anyway, and this lets us deal with the `before' list only; otherwise, we'd have to write out the code twice, once for each list. */ resplit (addrset, start, end); /* Now we know that any ranges in `after' are irrelevant to this operation, and that zero or more nodes at the head of `before' abut or overlap the range we're adding. If the node at the head of the `before' list extends after our range, either truncate it and shift it to `after', or split it in two. */ if ((node = addrset->before) != 0 && end < node->end) { /* If it also extends before the range we're deleting, we need to split it into two nodes: one for the part before and one for the part after. And we know we don't need to do anything else to the `before' list. */ if (node->start < start) { struct node *right_part = new_node (addrset, end + 1, node->end); right_part->next = addrset->after; addrset->after = right_part; node->end = start - 1; return; } /* Just shift the node onto `after', and adjust its size. */ addrset->before = node->next; node->next = addrset->after; addrset->after = node; node->start = end + 1; /* There may be more nodes on `before' that we need to adjust. */ } /* Now we know that the head of `before' doesn't extend beyond the end of the range we're deleting. Nothing will need to be added to `after'. Remove any nodes completely contained in the region we're deleting. */ while ((node = addrset->before) != 0 && start <= node->start) { addrset->before = node->next; free_node (addrset, node); } /* The head of `before' (if any) isn't completely contained in the range we're deleting, but it may partially overlap it at the low end. If that's the case, adjust its endpoint. */ if ((node = addrset->before) != 0 && start <= node->end) node->end = start - 1; } /* Shift *ADDR by OFFSET, and check that the direction of change from OLD to NEW is consistent with *DIR. If it isn't, raise an internal error. OFFSET must be non-zero. See offset_node_list for the meaning of *DIR. */ static void shift_addr (CORE_ADDR *addr, CORE_ADDR offset, int *dir) { CORE_ADDR old = *addr; CORE_ADDR new = (*addr += offset); if (new < old) { if (! *dir) *dir = -1; else gdb_assert (*dir == -1); } else { if (! *dir) *dir = 1; else gdb_assert (*dir == 1); } } /* Offset addresses in NODE and its successors by OFFSET. OFFSET must be non-zero. *DIR is the direction of the shift, used for error checking. - If *DIR is zero, then we don't know which direction things will be shifted yet. - If *DIR is -1 or +1, then OFFSET shifted other addresses in other nodes downwards or upwards, and we should raise an internal_error if any of our addresses go in the other direction. */ static void offset_node_list (struct node *node, CORE_ADDR offset, int *dir) { while (node) { shift_addr (&node->start, offset, dir); shift_addr (&node->end, offset, dir); node = node->next; } } void addrset_offset (struct addrset *addrset, CORE_ADDR offset) { int dir = 0; if (offset == 0) return; offset_node_list (addrset->before, offset, &dir); offset_node_list (addrset->after, offset, &dir); } int addrset_in (struct addrset *addrset, CORE_ADDR addr) { struct node *node; resplit (addrset, addr, addr); /* Now any node containing addr would have to be on `before'. */ return ((node = addrset->before) != 0 && node->start <= addr && addr <= node->end); } int addrset_span (struct addrset *addrset, CORE_ADDR addr, CORE_ADDR *start, CORE_ADDR *end) { struct node *node; resplit (addrset, addr, addr); /* Now any node containing addr would have to be on `before'. */ if ((node = addrset->before) != 0 && node->start <= addr && addr <= node->end) { *start = node->start; *end = node->end; return 1; } else return 0; } void addrset_first_range (struct addrset *addrset, CORE_ADDR *start, CORE_ADDR *end) { struct node *node; resplit (addrset, 0, 0); /* If there's anything on `before', that's our first range. Otherwise, if there's anything on after, that's it. */ if ((node = addrset->before) != 0 || (node = addrset->after) != 0) { *start = node->start; *end = node->end; } else { /* ADDRSET is empty. */ *start = 1; *end = 0; } } void addrset_next_range (struct addrset *addrset, CORE_ADDR after, CORE_ADDR *start, CORE_ADDR *end) { struct node *node; resplit (addrset, after, after); /* The first node on `before' might extend beyond our range. */ if ((node = addrset->before) != 0 && after < node->end) { *start = after + 1; *end = node->end; } /* Otherwise, the first node on `after' is the one we want. */ else if ((node = addrset->after) != 0) { *start = node->start; *end = node->end; } /* Otherwise, there are no more addresses in ADDRSET after AFTER. */ else { *start = 1; *end = 0; } } void addrset_print (struct addrset *addrset, struct ui_file *outfile) { CORE_ADDR start, end; int first = 1; for (addrset_first_range (addrset, &start, &end); start <= end; addrset_next_range (addrset, end, &start, &end)) { if (! first) fprintf_filtered (outfile, ", "); print_address_numeric (start, 1, outfile); fprintf_filtered (outfile, "-"); print_address_numeric (end, 1, outfile); first = 0; } } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: partial symbol table address range generalization 2001-10-24 17:08 ` Jim Blandy @ 2001-10-24 17:39 ` Daniel Jacobowitz 0 siblings, 0 replies; 6+ messages in thread From: Daniel Jacobowitz @ 2001-10-24 17:39 UTC (permalink / raw) To: Jim Blandy; +Cc: gdb-patches On Wed, Oct 24, 2001 at 07:09:25PM -0500, Jim Blandy wrote: > > Daniel Jacobowitz <drow@mvista.com> writes: > > > > This particular problem should be avoidable anyway. I would appreciate > > > > it if you would look at: > > > > < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html > > > > > > > Well, okay. But I'd much prefer to see the problem fixed by making > > > the symbol table contents more accurate than by adding another > > > heuristic for recognizing insane data. When checks like that get into > > > the code, they never go away, because nobody's ever really sure > > > whether the circumstances it was meant to cope with happen any more. > > > > Well, I wouldn't call it a heuristic in this case. If we don't have > > debugging info, isn't it a little bit insane to try to find a line > > number? > > Yes, certainly. But the reasoning I see going on is this: we can't > trust our address range data, and our source line search screws up if > the address range data isn't accurate, so we'll prop up the process by > pulling a name search into the process, too. The more checks we add, > the more likely we are to detect that something's screwey. > > This works, but I'd really rather just make the address range data > accurate in the first place. The part that strikes me as a > `heuristic' is the assumption that the address ranges are not quite > trustworthy, and that therefore we should cross-check them against the > function table before we proceed. > > If you think it's going to be too hard to get reliable address ranges > for symtabs, then it'd certainly be better to put Peter's change in; > at least something will get fixed. But if it's practical, I think > it's better policy to work hard at filling your data structures with > reliable data than to sprinkle your query code with sanity checks. Well, for at least STABS we're never going to get reliable address ranges. The data we need is not there. Any inference will be guesswork at best. But that isn't really my point. What the code in question (find_pc_sect_line) does is: /* Find the source file and line number for a given PC value and SECTION. Return a structure containing a symtab pointer, a line number, and a pc range for the entire source line. If we don't know what function we're in, then it isn't just that we can not trust the address range data; we know absolutely that we will not find a valid line number. I'm pretty sure that find_pc_function can fail in reasonable ways; in trampolines, for instance. It may well be that we won't reach find_pc_sect_line in any of those cases, ideally, but I'm not convinced of that. I'd prefer to both work hard to fill the symtabs with reliable data and also sprinkle sanity checks where they clearly seem to be needed. > > For performance reasons, it might be better to steal the iterator > > concept and have an opaque cookie separate from the start/end > > addresses. Otherwise, addrset_next_range is not going to be constant > > time, and that could get annoying in an objfile with a large number of > > sections... and C++ is notorious for absurd numbers of sections. > > It is (usually) constant time. :) The representation maintains a > `current position' in the sorted linked list of ranges, and starts its > searches from that point. So in a series of operations where each > address is near the last one (like a sequential scan), each operation > takes place in constant time. > > This is somewhat like having a single iterator object hidden inside > the addrset object; that is, at any given time, there's only one range > of addresses which can be operated on quickly. If instead we had an > explicit iterator type, one could do multiple traversals of different > areas efficiently. However, mutations can delete nodes from the list, > and change their values; it seems to me that keeping iterators valid > across mutation operations would add complexity. > > Anyway, it's an opaque type. If performance is a problem, or if we > want to add iterators, we can revamp the whole thing without breaking > its users. > > Here's the implementation: > > > /* addrset.c --- implementation of the address set datatype > Copyright 2001 Free Software Foundation, Inc. > > This file is part of GDB. > > This program is free software; you can redistribute it and/or modify > it under the terms of the GNU General Public License as published by > the Free Software Foundation; either version 2 of the License, or > (at your option) any later version. > > This program is distributed in the hope that it will be useful, > but WITHOUT ANY WARRANTY; without even the implied warranty of > MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > GNU General Public License for more details. > > You should have received a copy of the GNU General Public License > along with this program; if not, write to the Free Software > Foundation, Inc., 59 Temple Place - Suite 330, > Boston, MA 02111-1307, USA. */ > > #include "defs.h" > #include "obstack.h" > #include "gdb_assert.h" > #include "addrset.h" > > struct addrset { > > /* Two singly-linked lists of nodes; each node represents a range of > addresses in the addrset. None of these nodes' address ranges > overlap or abut each other. `before' is sorted in order of > decreasing addresses, `after' in order of increasing addresses. > All nodes in `before' have lower addresses than all the nodes in > `after'. > > Generally, whenever we operate on the list, we try split it > between `before' and `after' so that the division comes directly > after the last node we operated on. That way, sequential access > happens in constant time. Since we split it *after* the last > node we operate on, we're slightly biased towards traversal > towards increasing address, but it's not too bad if we want to go > the other way. */ > struct node *before, *after; > > /* If this is non-zero, allocate new nodes from this obstack. > Otherwise, allocate nodes using xmalloc. */ > struct obstack *obstack; > > /* We can't free nodes allocated from an obstack, so we might as > well keep them in a free list, and reuse them. This list can't > be global, as different obstacks have different lifetimes. > Ideally, it would be per-obstack, so a node could be freed from > one addrset and then re-used by another addrset in the same > obstack. But we don't have any convenient way to store > per-obstack info. So the free list is per-addrset. > > For addrsets allocated using xmalloc, we simply xfree nodes, and > hope malloc manages small blocks well; this field is always zero. */ > struct node *free; > > }; > > /* One contiguous range of addresses in an addrset. */ > struct node { > struct node *next; > > /* The start and end addresses, inclusive, of the address range. > Obviously, start <= end. If start == end, that's a one-byte > range. */ > CORE_ADDR start, end; > }; > > > struct addrset * > new_addrset (struct obstack *obstack) > { > struct addrset *new; > > if (obstack) > new = obstack_alloc (obstack, sizeof (*new)); > else > new = xmalloc (sizeof (*new)); > > memset (new, 0, sizeof (*new)); > > new->obstack = obstack; > > return new; > } > > > static void > xfree_node_list (struct node *n) > { > while (n) > { > struct node *next = n->next; > > xfree (n); > n = next; > } > } > > > void > addrset_free (struct addrset *addrset) > { > /* You can't free an addrset allocated in an obstack. */ > gdb_assert (! addrset->obstack); > > /* If it wasn't allocated on an obstack, it had better not have a > free list. */ > gdb_assert (! addrset->free); > > xfree_node_list (addrset->before); > xfree_node_list (addrset->after); > } > > > static struct node * > new_node (struct addrset *addrset, > CORE_ADDR start, CORE_ADDR end) > { > struct node *new; > > if (addrset->obstack) > { > /* If we have any nodes on the free list, use them. */ > if (addrset->free) > { > new = addrset->free; > addrset->free = new->next; > } > else > new = obstack_alloc (addrset->obstack, sizeof (*new)); > } > else > new = xmalloc (sizeof (*new)); > > new->next = 0; > new->start = start; > new->end = end; > > return new; > } > > > static void > free_node (struct addrset *addrset, struct node *node) > { > if (addrset->obstack) > { > node->next = addrset->free; > addrset->free = node; > } > else > xfree (node); > } > > > /* Return true if A is less than B, and there is at least one > address between them. */ > static int > less_and_separate (CORE_ADDR a, CORE_ADDR b) > { > /* We need both tests; think about what happens, for example, when > B is zero and A is (CORE_ADDR) -1. */ > return (a < b && b - a >= 2); > } > > > /* Shift nodes from `before' to `after', or in the other direction, > until the split falls at the right place. By `the right place', we > mean: > - any nodes on `after' are after, and do not abut, the range from > START to END, and > - any nodes on `before' are either completely before, abut, or overlap > the range from START to END. > Once this is done, any nodes on `after' are irrelevant to our > operation, and any nodes that abut or overlap our range are at the > head of `before'. */ > static void > resplit (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end) > { > struct node *node; > > /* We know the `before' and `after' lists are each sorted in the > proper direction, and contain non-overlapping, non-adjacent > ranges. > > The first step is to move all ranges from `after' to `before' > that are before, overlap, or abut the range we're operating on. > Be careful of nodes abutting either end of the address space. */ > while ((node = addrset->after) != 0 > && ! less_and_separate (end, node->start)) > { > addrset->after = node->next; > node->next = addrset->before; > addrset->before = node; > } > > /* Now, any ranges in `after' are after and do not abut the range > we're operating on. This means that all those nodes are > irrelevant to this operation, and we can ignore them. > > Thus, any nodes we care about are somewhere in `before'. Bring > them to the head of that list by shifting any nodes from `before' > to `after' that are completely after the range we're operating > on. > > The astute reader will note that the condition for moving a node > in this loop is the complement of that for the previous loop. > That is, the condition for moving a node in one direction is the > opposite of moving it in the other direction. This means that > only one of these two loops' bodies will ever be executed in a > given call. */ > while ((node = addrset->before) != 0 > && less_and_separate (end, node->start)) > { > addrset->before = node->next; > node->next = addrset->after; > addrset->after = node; > } > } > > > void > addrset_add (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end) > { > struct node *node, *save; > > gdb_assert (start <= end); > > resplit (addrset, start, end); > > /* Now we know that any ranges in `after' are irrelevant to this > operation, and that zero or more nodes at the head of `before' > abut or overlap the range we're adding. Remove any such nodes > from `before', and absorb their ranges into ours. */ > save = 0; > while ((node = addrset->before) != 0 > && ! less_and_separate (node->end, start)) > { > if (node->start < start) > start = node->start; > if (node->end > end) > end = node->end; > addrset->before = node->next; > > /* We want to remove this node from the list, but we know we're > going to insert a new one in a bit, so hold onto the first > node we delete, just to save a call to xfree and xmalloc. */ > if (save) > free_node (addrset, node); > else > save = node; > } > > /* Create a node for the new range. */ > if (save) > { > save->start = start; > save->end = end; > } > else > save = new_node (addrset, start, end); > > /* Stick it on the head of the `before' list. */ > save->next = addrset->before; > addrset->before = save; > } > > > void > addrset_remove (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end) > { > struct node *node; > > gdb_assert (start <= end); > > /* This could end up moving a bunch of nodes to `before' that we're > just going to delete anyway. But our running time is linear in > the number of ranges we're going to delete anyway, and this lets > us deal with the `before' list only; otherwise, we'd have to > write out the code twice, once for each list. */ > resplit (addrset, start, end); > > /* Now we know that any ranges in `after' are irrelevant to this > operation, and that zero or more nodes at the head of `before' > abut or overlap the range we're adding. > > If the node at the head of the `before' list extends after our > range, either truncate it and shift it to `after', or split it in > two. */ > if ((node = addrset->before) != 0 > && end < node->end) > { > /* If it also extends before the range we're deleting, we need > to split it into two nodes: one for the part before and one > for the part after. And we know we don't need to do anything > else to the `before' list. */ > if (node->start < start) > { > struct node *right_part = new_node (addrset, end + 1, node->end); > right_part->next = addrset->after; > addrset->after = right_part; > node->end = start - 1; > return; > } > > /* Just shift the node onto `after', and adjust its size. */ > addrset->before = node->next; > node->next = addrset->after; > addrset->after = node; > node->start = end + 1; > > /* There may be more nodes on `before' that we need to adjust. */ > } > > /* Now we know that the head of `before' doesn't extend beyond the > end of the range we're deleting. Nothing will need to be added > to `after'. > > Remove any nodes completely contained in the region we're > deleting. */ > while ((node = addrset->before) != 0 > && start <= node->start) > { > addrset->before = node->next; > free_node (addrset, node); > } > > /* The head of `before' (if any) isn't completely contained in the > range we're deleting, but it may partially overlap it at the low > end. If that's the case, adjust its endpoint. */ > if ((node = addrset->before) != 0 > && start <= node->end) > node->end = start - 1; > } > > > /* Shift *ADDR by OFFSET, and check that the direction of change from > OLD to NEW is consistent with *DIR. If it isn't, raise an internal > error. OFFSET must be non-zero. See offset_node_list for the > meaning of *DIR. */ > static void > shift_addr (CORE_ADDR *addr, CORE_ADDR offset, int *dir) > { > CORE_ADDR old = *addr; > CORE_ADDR new = (*addr += offset); > > if (new < old) > { > if (! *dir) > *dir = -1; > else > gdb_assert (*dir == -1); > } > else > { > if (! *dir) > *dir = 1; > else > gdb_assert (*dir == 1); > } > } > > > /* Offset addresses in NODE and its successors by OFFSET. > OFFSET must be non-zero. > *DIR is the direction of the shift, used for error checking. > - If *DIR is zero, then we don't know which direction things will be > shifted yet. > - If *DIR is -1 or +1, then OFFSET shifted other addresses in other nodes > downwards or upwards, and we should raise an internal_error if any > of our addresses go in the other direction. */ > static void > offset_node_list (struct node *node, CORE_ADDR offset, int *dir) > { > while (node) > { > shift_addr (&node->start, offset, dir); > shift_addr (&node->end, offset, dir); > > node = node->next; > } > } > > > void > addrset_offset (struct addrset *addrset, CORE_ADDR offset) > { > int dir = 0; > > if (offset == 0) > return; > > offset_node_list (addrset->before, offset, &dir); > offset_node_list (addrset->after, offset, &dir); > } > > > int > addrset_in (struct addrset *addrset, CORE_ADDR addr) > { > struct node *node; > > resplit (addrset, addr, addr); > > /* Now any node containing addr would have to be on `before'. */ > return ((node = addrset->before) != 0 > && node->start <= addr && addr <= node->end); > } > > > int > addrset_span (struct addrset *addrset, > CORE_ADDR addr, > CORE_ADDR *start, > CORE_ADDR *end) > { > struct node *node; > > resplit (addrset, addr, addr); > > /* Now any node containing addr would have to be on `before'. */ > if ((node = addrset->before) != 0 > && node->start <= addr && addr <= node->end) > { > *start = node->start; > *end = node->end; > return 1; > } > else > return 0; > } > > > void > addrset_first_range (struct addrset *addrset, > CORE_ADDR *start, > CORE_ADDR *end) > { > struct node *node; > > resplit (addrset, 0, 0); > > /* If there's anything on `before', that's our first range. > Otherwise, if there's anything on after, that's it. */ > if ((node = addrset->before) != 0 > || (node = addrset->after) != 0) > { > *start = node->start; > *end = node->end; > } > else > { > /* ADDRSET is empty. */ > *start = 1; > *end = 0; > } > } > > > void > addrset_next_range (struct addrset *addrset, > CORE_ADDR after, > CORE_ADDR *start, > CORE_ADDR *end) > { > struct node *node; > > resplit (addrset, after, after); > > /* The first node on `before' might extend beyond our range. */ > if ((node = addrset->before) != 0 > && after < node->end) > { > *start = after + 1; > *end = node->end; > } > > /* Otherwise, the first node on `after' is the one we want. */ > else if ((node = addrset->after) != 0) > { > *start = node->start; > *end = node->end; > } > > /* Otherwise, there are no more addresses in ADDRSET after AFTER. */ > else > { > *start = 1; > *end = 0; > } > } > > > void > addrset_print (struct addrset *addrset, struct ui_file *outfile) > { > CORE_ADDR start, end; > int first = 1; > > for (addrset_first_range (addrset, &start, &end); > start <= end; > addrset_next_range (addrset, end, &start, &end)) > { > if (! first) > fprintf_filtered (outfile, ", "); > print_address_numeric (start, 1, outfile); > fprintf_filtered (outfile, "-"); > print_address_numeric (end, 1, outfile); > first = 0; > } > } > -- Daniel Jacobowitz Carnegie Mellon University MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: partial symbol table address range generalization 2001-10-23 16:33 RFC: partial symbol table address range generalization Jim Blandy 2001-10-23 16:54 ` Daniel Jacobowitz @ 2001-10-24 1:52 ` Eli Zaretskii 1 sibling, 0 replies; 6+ messages in thread From: Eli Zaretskii @ 2001-10-24 1:52 UTC (permalink / raw) To: Jim Blandy; +Cc: gdb-patches On Tue, 23 Oct 2001, Jim Blandy wrote: > I'd like to change GDB's `struct partial_symtab' structure to cover an > arbitrary set of addresses, not just a single contiguous span of > addresses. I've written code for a `struct addrset' datatype > (essentially a linked list of ranges) and provided some functions to > make it easy and robust to work with. I'd like to gradually wean the > partial symbol table code away from the textlow/texthigh > representation, and have it use an addrset to represent a partial > symbol table's coverage. Please consider writing something about this for gdbint.texinfo, eventually. The GDB symbol table facilities and algorithms are notoriously underdocumented at the moment, IMHO. TIA ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-10-24 17:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-23 16:33 RFC: partial symbol table address range generalization Jim Blandy
2001-10-23 16:54 ` Daniel Jacobowitz
[not found] ` <npsnc965cl.fsf@zwingli.cygnus.com>
2001-10-23 21:42 ` Daniel Jacobowitz
2001-10-24 17:08 ` Jim Blandy
2001-10-24 17:39 ` Daniel Jacobowitz
2001-10-24 1:52 ` Eli Zaretskii
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox