RFC: partial symbol table address range generalization

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* RFC: partial symbol table address range generalization
@ 2001-10-23 16:33 Jim Blandy
  2001-10-23 16:54 ` Daniel Jacobowitz
  2001-10-24  1:52 ` Eli Zaretskii
  0 siblings, 2 replies; 6+ messages in thread
From: Jim Blandy @ 2001-10-23 16:33 UTC (permalink / raw)
  To: gdb-patches

This is the first patch in a series.  If you are interested in GDB's
symbol table, I'd appreciate your comments and review.

The Problem:

At the moment, GDB's partial symbol table structures assume that each
compilation unit's text occupies a single contiguous range of
addresses.  If a given address falls between a `struct partial_symtab'
structure's `texthigh' and `textlow' addresses, then GDB assumes that
reading that partial symbol table's debugging info will provide
complete information about that address.

However, this assumption isn't true.  A given .o file's code can
actually appear in any number of distinct regions, separated by code
from other .o files.  This can happen when we compile and link a C++
program, like the below (from GCC's test suite):

    // Special g++ Options: -fexceptions -g
    // excess errors test - XFAIL a29k-*-* sparc64-*-elf arm-*-pe

    class zeroset {
    public:
      ~zeroset () { }
    };

    int main () {
      zeroset a;
      try {
        ;
      } catch( zeroset ) {
      }
    }

G++ puts the code for `main' in the .o file's .text section, and the
code for the `zeroset' destructor in a section named
.gnu.linkonce.t._ZN7zerosetD1Ev.  The linker script gathers all the .o
files' .text sections together, followed by all the .gnu.linkonce.t.*
sections.  As a result, library functions' code appears between that
for `main' and the `zeroset' destructor in the final executable.  As a
result, GDB builds a `struct partial_symtab' for the above source file
whose `textlow' and `texthigh' range encloses not only the code for
`main' and the destructor, but all the library functions' code as
well.

There is some logic in GDB's lookup functions to cope with overlapping
partial symtabs, and they've been working pretty well on our behalf.
However, they're fragile, and do break in everyday use.  For example,
in the executable produced from the source file above, if you try to
set a breakpoint on a library routine compiled without debug
information, GDB will set the breakpoint in `main' instead.  (On some
platforms, `_exit' is such a function.)

It is possible to change the linker script to link all a .o file's
sections together, but the linker scripts that are currently in
widespread use do not, and the same situation can arise in other
circumstances (reordered executables, for example).


The Solution:

I'd like to change GDB's `struct partial_symtab' structure to cover an
arbitrary set of addresses, not just a single contiguous span of
addresses.  I've written code for a `struct addrset' datatype
(essentially a linked list of ranges) and provided some functions to
make it easy and robust to work with.  I'd like to gradually wean the
partial symbol table code away from the textlow/texthigh
representation, and have it use an addrset to represent a partial
symbol table's coverage.

This isn't a complete solution:

- Code that works with the full symbol table still uses each `struct
  symtab's global block's `startaddr' and `endaddr' members to find
  the `struct symtab' that covers a given address; this suffers from
  the same problem as the current partial symtabs.

- Changing the STABS partial symbol table scanner to recognize
  non-contiguous .o files seems like it should be possible, but I
  wasn't able to do it in a way that I was sure wouldn't break any
  targets.  So for now, I'm going to leave the STABS reader alone; the
  data structures will be capable of describing non-contiguous .o
  files on STABS platforms, but the reader will never recognize them.

However, Dwarf 2 does provide the information in a reliable way, and
this change will be useful in that case.

Below is the first patch in the series.  It's meant to have no effect
on GDB's behavior.


2001-10-23  Jim Blandy  <jimb@redhat.com>

	Isolate STABS readers' use of the `textlow' and `texthigh' fields
	of `struct partial_symtab' to only a few locations.  This change
	is not supposed to affect the way the values are computed, only
	where they live.

	* dbxread.c (struct symloc): Add `textlow' and `texthigh' fields
	to the reader-specific structure.
	* mdebugread.c (struct symloc): Same.
	* dbxread.c (TEXTLOW, TEXTHIGH): New accessor macros.
	* mdebugread.c (TEXTLOW, TEXTHIGH): Same.
	* dbxread.c (dbx_symfile_read): After we've built all our partial
	symbol tables, set each partial symtab's `textlow' and `texthigh'
	fields from our reader-specific structure.
	* mdebugread.c (mdebug_build_psymtabs): Same.
	* dbxread.c (start_psymtab): Initialize the reader-specific
	structure's `textlow' and `texthigh' from the new psymtab's.
	* mdebugread.c (parse_partial_symbols, new_psymtab): Same.
	* dbxread.c (read_dbx_symtab, end_psymtab, read_ofile_symtab): Use
	the reader-specific `textlow' and `texthigh', not the generic
	psymtab fields.
	* mdebugread.c (parse_lines, parse_partial_symbols,
	psymtab_to_symtab_1): Same.
	* partial-stab.h: Same.

Index: gdb/dbxread.c
===================================================================
RCS file: /cvs/src/src/gdb/dbxread.c,v
retrieving revision 1.24
diff -c -r1.24 dbxread.c
*** gdb/dbxread.c	2001/09/20 03:03:39	1.24
--- gdb/dbxread.c	2001/10/23 23:14:38
***************
*** 76,81 ****
--- 76,87 ----
  
  struct symloc
    {
+     /* STABS doesn't reliably give us nice start and end addresses for
+        each function.  Instead, we are told the addresses of various
+        boundary points, and we have to gather those together to build
+        ranges.  These are our running best guess as to the range of
+        text addresses for this psymtab.  */
+     CORE_ADDR textlow, texthigh;
  
      /* Offset within the file symbol table of first local symbol for this
         file.  */
***************
*** 105,110 ****
--- 111,118 ----
  #define LDSYMOFF(p) (((struct symloc *)((p)->read_symtab_private))->ldsymoff)
  #define LDSYMLEN(p) (((struct symloc *)((p)->read_symtab_private))->ldsymlen)
  #define SYMLOC(p) ((struct symloc *)((p)->read_symtab_private))
+ #define TEXTLOW(p) (SYMLOC(p)->textlow)
+ #define TEXTHIGH(p) (SYMLOC(p)->texthigh)
  #define SYMBOL_SIZE(p) (SYMLOC(p)->symbol_size)
  #define SYMBOL_OFFSET(p) (SYMLOC(p)->symbol_offset)
  #define STRING_OFFSET(p) (SYMLOC(p)->string_offset)
***************
*** 599,604 ****
--- 607,625 ----
  
    read_dbx_dynamic_symtab (objfile);
  
+   /* Take the text ranges the STABS partial symbol scanner computed
+      for each of the psymtabs and convert it into the canonical form
+      for psymtabs.  */
+   {
+     struct partial_symtab *p;
+ 
+     ALL_OBJFILE_PSYMTABS (objfile, p)
+       {
+         p->textlow = TEXTLOW (p);
+         p->texthigh = TEXTHIGH (p);
+       }
+   }
+ 
    /* Install any minimal symbols that have been collected as the current
       minimal symbols for this objfile. */
  
***************
*** 1341,1347 ****
  
        end_psymtab (pst, psymtab_include_list, includes_used,
  		   symnum * symbol_size,
! 		   text_end > pst->texthigh ? text_end : pst->texthigh,
  		   dependency_list, dependencies_used, textlow_not_set);
      }
  
--- 1362,1368 ----
  
        end_psymtab (pst, psymtab_include_list, includes_used,
  		   symnum * symbol_size,
! 		   text_end > TEXTHIGH (pst) ? text_end : TEXTHIGH (pst),
  		   dependency_list, dependencies_used, textlow_not_set);
      }
  
***************
*** 1367,1372 ****
--- 1388,1395 ----
  
    result->read_symtab_private = (char *)
      obstack_alloc (&objfile->psymbol_obstack, sizeof (struct symloc));
+   TEXTLOW (result) = result->textlow;
+   TEXTHIGH (result) = result->texthigh;
    LDSYMOFF (result) = ldsymoff;
    result->read_symtab = dbx_psymtab_to_symtab;
    SYMBOL_SIZE (result) = symbol_size;
***************
*** 1402,1408 ****
  
    if (capping_symbol_offset != -1)
      LDSYMLEN (pst) = capping_symbol_offset - LDSYMOFF (pst);
!   pst->texthigh = capping_text;
  
  #ifdef SOFUN_ADDRESS_MAYBE_MISSING
    /* Under Solaris, the N_SO symbols always have a value of 0,
--- 1425,1431 ----
  
    if (capping_symbol_offset != -1)
      LDSYMLEN (pst) = capping_symbol_offset - LDSYMOFF (pst);
!   TEXTHIGH (pst) = capping_text;
  
  #ifdef SOFUN_ADDRESS_MAYBE_MISSING
    /* Under Solaris, the N_SO symbols always have a value of 0,
***************
*** 1420,1426 ****
       a reliable texthigh by taking the address plus size of the
       last function in the file.  */
  
!   if (pst->texthigh == 0 && last_function_name)
      {
        char *p;
        int n;
--- 1443,1449 ----
       a reliable texthigh by taking the address plus size of the
       last function in the file.  */
  
!   if (TEXTHIGH (pst) == 0 && last_function_name)
      {
        char *p;
        int n;
***************
*** 1446,1459 ****
  	}
  
        if (minsym)
! 	pst->texthigh = SYMBOL_VALUE_ADDRESS (minsym) + MSYMBOL_SIZE (minsym);
  
        last_function_name = NULL;
      }
  
    /* this test will be true if the last .o file is only data */
    if (textlow_not_set)
!     pst->textlow = pst->texthigh;
    else
      {
        struct partial_symtab *p1;
--- 1469,1482 ----
  	}
  
        if (minsym)
! 	TEXTHIGH (pst) = SYMBOL_VALUE_ADDRESS (minsym) + MSYMBOL_SIZE (minsym);
  
        last_function_name = NULL;
      }
  
    /* this test will be true if the last .o file is only data */
    if (textlow_not_set)
!     TEXTLOW (pst) = TEXTHIGH (pst);
    else
      {
        struct partial_symtab *p1;
***************
*** 1466,1477 ****
  
        ALL_OBJFILE_PSYMTABS (objfile, p1)
        {
! 	if (p1->texthigh == 0 && p1->textlow != 0 && p1 != pst)
  	  {
! 	    p1->texthigh = pst->textlow;
  	    /* if this file has only data, then make textlow match texthigh */
! 	    if (p1->textlow == 0)
! 	      p1->textlow = p1->texthigh;
  	  }
        }
      }
--- 1489,1500 ----
  
        ALL_OBJFILE_PSYMTABS (objfile, p1)
        {
! 	if (TEXTHIGH (p1) == 0 && TEXTLOW (p1) != 0 && p1 != pst)
  	  {
! 	    TEXTHIGH (p1) = TEXTLOW (pst);
  	    /* if this file has only data, then make textlow match texthigh */
! 	    if (TEXTLOW (p1) == 0)
! 	      TEXTLOW (p1) = TEXTHIGH (p1);
  	  }
        }
      }
***************
*** 1508,1515 ****
  				sizeof (struct symloc));
        LDSYMOFF (subpst) =
  	LDSYMLEN (subpst) =
! 	subpst->textlow =
! 	subpst->texthigh = 0;
  
        /* We could save slight bits of space by only making one of these,
           shared by the entire set of include files.  FIXME-someday.  */
--- 1531,1538 ----
  				sizeof (struct symloc));
        LDSYMOFF (subpst) =
  	LDSYMLEN (subpst) =
! 	TEXTLOW (subpst) =
! 	TEXTHIGH (subpst) = 0;
  
        /* We could save slight bits of space by only making one of these,
           shared by the entire set of include files.  FIXME-someday.  */
***************
*** 1677,1684 ****
    objfile = pst->objfile;
    sym_offset = LDSYMOFF (pst);
    sym_size = LDSYMLEN (pst);
!   text_offset = pst->textlow;
!   text_size = pst->texthigh - pst->textlow;
    /* This cannot be simply objfile->section_offsets because of
       elfstab_offset_sections() which initializes the psymtab section
       offsets information in a special way, and that is different from
--- 1700,1707 ----
    objfile = pst->objfile;
    sym_offset = LDSYMOFF (pst);
    sym_size = LDSYMLEN (pst);
!   text_offset = TEXTLOW (pst);
!   text_size = TEXTHIGH (pst) - TEXTLOW (pst);
    /* This cannot be simply objfile->section_offsets because of
       elfstab_offset_sections() which initializes the psymtab section
       offsets information in a special way, and that is different from
***************
*** 1823,1835 ****
  
    /* In a Solaris elf file, this variable, which comes from the
       value of the N_SO symbol, will still be 0.  Luckily, text_offset,
!      which comes from pst->textlow is correct. */
    if (last_source_start_addr == 0)
      last_source_start_addr = text_offset;
  
    /* In reordered executables last_source_start_addr may not be the
       lower bound for this symtab, instead use text_offset which comes
!      from pst->textlow which is correct.  */
    if (last_source_start_addr > text_offset)
      last_source_start_addr = text_offset;
  
--- 1846,1858 ----
  
    /* In a Solaris elf file, this variable, which comes from the
       value of the N_SO symbol, will still be 0.  Luckily, text_offset,
!      which comes from TEXTLOW (pst) is correct. */
    if (last_source_start_addr == 0)
      last_source_start_addr = text_offset;
  
    /* In reordered executables last_source_start_addr may not be the
       lower bound for this symtab, instead use text_offset which comes
!      from TEXTLOW (pst) which is correct.  */
    if (last_source_start_addr > text_offset)
      last_source_start_addr = text_offset;
  
Index: gdb/mdebugread.c
===================================================================
RCS file: /cvs/src/src/gdb/mdebugread.c,v
retrieving revision 1.16
diff -c -r1.16 mdebugread.c
*** gdb/mdebugread.c	2001/10/12 23:51:28	1.16
--- gdb/mdebugread.c	2001/10/23 23:14:41
***************
*** 105,110 ****
--- 105,115 ----
  
  struct symloc
    {
+     /* Our running best guess as to the range of text addresses for
+        this psymtab.  After we've read everything in, we use this to
+        build pst->text_addrs.  */
+     CORE_ADDR textlow, texthigh;
+ 
      /* Index of the FDR that this psymtab represents.  */
      int fdr_idx;
      /* The BFD that the psymtab was created from.  */
***************
*** 120,125 ****
--- 125,132 ----
    };
  
  #define PST_PRIVATE(p) ((struct symloc *)(p)->read_symtab_private)
+ #define TEXTLOW(p) (PST_PRIVATE(p)->textlow)
+ #define TEXTHIGH(p) (PST_PRIVATE(p)->texthigh)
  #define FDR_IDX(p) (PST_PRIVATE(p)->fdr_idx)
  #define CUR_BFD(p) (PST_PRIVATE(p)->cur_bfd)
  #define DEBUG_SWAP(p) (PST_PRIVATE(p)->debug_swap)
***************
*** 516,521 ****
--- 523,541 ----
  
    parse_partial_symbols (objfile);
  
+   /* Take the text ranges the partial symbol scanner computed for each
+      of the psymtabs and convert it into the canonical form for
+      psymtabs.  */
+   {
+     struct partial_symtab *p;
+ 
+     ALL_OBJFILE_PSYMTABS (objfile, p)
+       {
+         p->textlow = TEXTLOW (p);
+         p->texthigh = TEXTHIGH (p);
+       }
+   }
+ 
  #if 0
    /* Check to make sure file was compiled with -g.  If not, warn the
       user of this limitation.  */
***************
*** 2174,2180 ****
  	halt = base + fh->cbLine;
        base += pr->cbLineOffset;
  
!       adr = pst->textlow + pr->adr - lowest_pdr_addr;
  
        l = adr >> 2;		/* in words */
        for (lineno = pr->lnLow; base < halt;)
--- 2194,2200 ----
  	halt = base + fh->cbLine;
        base += pr->cbLineOffset;
  
!       adr = TEXTLOW (pst) + pr->adr - lowest_pdr_addr;
  
        l = adr >> 2;		/* in words */
        for (lineno = pr->lnLow; base < halt;)
***************
*** 2509,2514 ****
--- 2529,2536 ----
        memset ((PTR) pst->read_symtab_private, 0, sizeof (struct symloc));
  
        save_pst = pst;
+       TEXTLOW (pst) = pst->textlow;
+       TEXTHIGH (pst) = pst->texthigh;
        FDR_IDX (pst) = f_idx;
        CUR_BFD (pst) = cur_bfd;
        DEBUG_SWAP (pst) = debug_swap;
***************
*** 2544,2550 ****
  	psymtab_language = prev_language;
        PST_PRIVATE (pst)->pst_language = psymtab_language;
  
!       pst->texthigh = pst->textlow;
  
        /* For stabs-in-ecoff files, the second symbol must be @stab.
           This symbol is emitted by mips-tfile to signal that the
--- 2566,2572 ----
  	psymtab_language = prev_language;
        PST_PRIVATE (pst)->pst_language = psymtab_language;
  
!       TEXTHIGH (pst) = TEXTLOW (pst);
  
        /* For stabs-in-ecoff files, the second symbol must be @stab.
           This symbol is emitted by mips-tfile to signal that the
***************
*** 2611,2620 ****
  
  			  /* Kludge for Irix 5.2 zero fh->adr.  */
  			  if (!relocatable
! 			  && (pst->textlow == 0 || procaddr < pst->textlow))
! 			    pst->textlow = procaddr;
! 			  if (high > pst->texthigh)
! 			    pst->texthigh = high;
  			}
  		    }
  		  else if (sh.st == stStatic)
--- 2633,2642 ----
  
  			  /* Kludge for Irix 5.2 zero fh->adr.  */
  			  if (!relocatable
! 			  && (TEXTLOW (pst) == 0 || procaddr < TEXTLOW (pst)))
! 			    TEXTLOW (pst) = procaddr;
! 			  if (high > TEXTHIGH (pst))
! 			    TEXTHIGH (pst) = high;
  			}
  		    }
  		  else if (sh.st == stStatic)
***************
*** 2703,2709 ****
    (pst = save_pst)
  #define END_PSYMTAB(pst,ilist,ninc,c_off,c_text,dep_list,n_deps,textlow_not_set) (void)0
  #define HANDLE_RBRAC(val) \
!   if ((val) > save_pst->texthigh) save_pst->texthigh = (val);
  #include "partial-stab.h"
  
  		if (stabstring
--- 2725,2731 ----
    (pst = save_pst)
  #define END_PSYMTAB(pst,ilist,ninc,c_off,c_text,dep_list,n_deps,textlow_not_set) (void)0
  #define HANDLE_RBRAC(val) \
!   if ((val) > TEXTHIGH (save_pst)) TEXTHIGH (save_pst) = (val);
  #include "partial-stab.h"
  
  		if (stabstring
***************
*** 2836,2847 ****
  
  		  /* Kludge for Irix 5.2 zero fh->adr.  */
  		  if (!relocatable
! 		      && (pst->textlow == 0 || procaddr < pst->textlow))
! 		    pst->textlow = procaddr;
  
  		  high = procaddr + sh.value;
! 		  if (high > pst->texthigh)
! 		    pst->texthigh = high;
  		  continue;
  
  		case stStatic:	/* Variable */
--- 2858,2869 ----
  
  		  /* Kludge for Irix 5.2 zero fh->adr.  */
  		  if (!relocatable
! 		      && (TEXTLOW (pst) == 0 || procaddr < TEXTLOW (pst)))
! 		    TEXTLOW (pst) = procaddr;
  
  		  high = procaddr + sh.value;
! 		  if (high > TEXTHIGH (pst))
! 		    TEXTHIGH (pst) = high;
  		  continue;
  
  		case stStatic:	/* Variable */
***************
*** 3015,3030 ****
           empty and put on the free list.  */
        fdr_to_pst[f_idx].pst = end_psymtab (save_pst,
  					psymtab_include_list, includes_used,
! 					   -1, save_pst->texthigh,
  		       dependency_list, dependencies_used, textlow_not_set);
        includes_used = 0;
        dependencies_used = 0;
  
!       if (objfile->ei.entry_point >= save_pst->textlow &&
! 	  objfile->ei.entry_point < save_pst->texthigh)
  	{
! 	  objfile->ei.entry_file_lowpc = save_pst->textlow;
! 	  objfile->ei.entry_file_highpc = save_pst->texthigh;
  	}
  
        /* The objfile has its functions reordered if this partial symbol
--- 3037,3052 ----
           empty and put on the free list.  */
        fdr_to_pst[f_idx].pst = end_psymtab (save_pst,
  					psymtab_include_list, includes_used,
! 					   -1, TEXTHIGH (save_pst),
  		       dependency_list, dependencies_used, textlow_not_set);
        includes_used = 0;
        dependencies_used = 0;
  
!       if (objfile->ei.entry_point >= TEXTLOW (save_pst) &&
! 	  objfile->ei.entry_point < TEXTHIGH (save_pst))
  	{
! 	  objfile->ei.entry_file_lowpc = TEXTLOW (save_pst);
! 	  objfile->ei.entry_file_highpc = TEXTHIGH (save_pst);
  	}
  
        /* The objfile has its functions reordered if this partial symbol
***************
*** 3040,3054 ****
           other cases.  */
        save_pst = fdr_to_pst[f_idx].pst;
        if (save_pst != NULL
! 	  && save_pst->textlow != 0
  	  && !(objfile->flags & OBJF_REORDERED))
  	{
  	  ALL_OBJFILE_PSYMTABS (objfile, pst)
  	  {
  	    if (save_pst != pst
! 		&& save_pst->textlow >= pst->textlow
! 		&& save_pst->textlow < pst->texthigh
! 		&& save_pst->texthigh > pst->texthigh)
  	      {
  		objfile->flags |= OBJF_REORDERED;
  		break;
--- 3062,3076 ----
           other cases.  */
        save_pst = fdr_to_pst[f_idx].pst;
        if (save_pst != NULL
! 	  && TEXTLOW (save_pst) != 0
  	  && !(objfile->flags & OBJF_REORDERED))
  	{
  	  ALL_OBJFILE_PSYMTABS (objfile, pst)
  	  {
  	    if (save_pst != pst
! 		&& TEXTLOW (save_pst) >= TEXTLOW (pst)
! 		&& TEXTLOW (save_pst) < TEXTHIGH (pst)
! 		&& TEXTHIGH (save_pst) > TEXTHIGH (pst))
  	      {
  		objfile->flags |= OBJF_REORDERED;
  		break;
***************
*** 3252,3258 ****
    /* Do nothing if this is a dummy psymtab.  */
  
    if (pst->n_global_syms == 0 && pst->n_static_syms == 0
!       && pst->textlow == 0 && pst->texthigh == 0)
      return;
  
    /* Now read the symbols for this symtab */
--- 3274,3280 ----
    /* Do nothing if this is a dummy psymtab.  */
  
    if (pst->n_global_syms == 0 && pst->n_static_syms == 0
!       && TEXTLOW (pst) == 0 && TEXTHIGH (pst) == 0)
      return;
  
    /* Now read the symbols for this symtab */
***************
*** 3400,3406 ****
  
        if (! last_symtab_ended)
  	{
! 	  st = end_symtab (pst->texthigh, pst->objfile, SECT_OFF_TEXT (pst->objfile));
  	  end_stabs ();
  	}
  
--- 3422,3428 ----
  
        if (! last_symtab_ended)
  	{
! 	  st = end_symtab (TEXTHIGH (pst), pst->objfile, SECT_OFF_TEXT (pst->objfile));
  	  end_stabs ();
  	}
  
***************
*** 3490,3496 ****
        top_stack->cur_st = st;
        top_stack->cur_block = BLOCKVECTOR_BLOCK (BLOCKVECTOR (st),
  						STATIC_BLOCK);
!       BLOCK_START (top_stack->cur_block) = pst->textlow;
        BLOCK_END (top_stack->cur_block) = 0;
        top_stack->blocktype = stFile;
        top_stack->maxsyms = 2 * f_max;
--- 3512,3518 ----
        top_stack->cur_st = st;
        top_stack->cur_block = BLOCKVECTOR_BLOCK (BLOCKVECTOR (st),
  						STATIC_BLOCK);
!       BLOCK_START (top_stack->cur_block) = TEXTLOW (pst);
        BLOCK_END (top_stack->cur_block) = 0;
        top_stack->blocktype = stFile;
        top_stack->maxsyms = 2 * f_max;
Index: gdb/partial-stab.h
===================================================================
RCS file: /cvs/src/src/gdb/partial-stab.h,v
retrieving revision 1.13
diff -c -r1.13 partial-stab.h
*** gdb/partial-stab.h	2001/09/06 20:50:48	1.13
--- gdb/partial-stab.h	2001/10/23 23:14:42
***************
*** 104,115 ****
  	if (past_first_source_file && pst
  	/* The gould NP1 uses low values for .o and -l symbols
  	   which are not the address.  */
! 	    && CUR_SYMBOL_VALUE >= pst->textlow)
  	  {
  	    END_PSYMTAB (pst, psymtab_include_list, includes_used,
  			 symnum * symbol_size,
! 			 CUR_SYMBOL_VALUE > pst->texthigh
! 			 ? CUR_SYMBOL_VALUE : pst->texthigh,
  		       dependency_list, dependencies_used, textlow_not_set);
  	    pst = (struct partial_symtab *) 0;
  	    includes_used = 0;
--- 104,115 ----
  	if (past_first_source_file && pst
  	/* The gould NP1 uses low values for .o and -l symbols
  	   which are not the address.  */
! 	    && CUR_SYMBOL_VALUE >= TEXTLOW (pst))
  	  {
  	    END_PSYMTAB (pst, psymtab_include_list, includes_used,
  			 symnum * symbol_size,
! 			 CUR_SYMBOL_VALUE > TEXTHIGH (pst)
! 			 ? CUR_SYMBOL_VALUE : TEXTHIGH (pst),
  		       dependency_list, dependencies_used, textlow_not_set);
  	    pst = (struct partial_symtab *) 0;
  	    includes_used = 0;
***************
*** 236,242 ****
  	    {
  	      END_PSYMTAB (pst, psymtab_include_list, includes_used,
  			   symnum * symbol_size,
! 			   valu > pst->texthigh ? valu : pst->texthigh,
  			   dependency_list, dependencies_used,
  			   prev_textlow_not_set);
  	      pst = (struct partial_symtab *) 0;
--- 236,242 ----
  	    {
  	      END_PSYMTAB (pst, psymtab_include_list, includes_used,
  			   symnum * symbol_size,
! 			   valu > TEXTHIGH (pst) ? valu : TEXTHIGH (pst),
  			   dependency_list, dependencies_used,
  			   prev_textlow_not_set);
  	      pst = (struct partial_symtab *) 0;
***************
*** 405,412 ****
  	   function relative stabs, or the address of the function's
  	   end for old style stabs.  */
  	valu = CUR_SYMBOL_VALUE + last_function_start;
! 	if (pst->texthigh == 0 || valu > pst->texthigh)
! 	  pst->texthigh = valu;
  	break;
        }
  #endif
--- 405,412 ----
  	   function relative stabs, or the address of the function's
  	   end for old style stabs.  */
  	valu = CUR_SYMBOL_VALUE + last_function_start;
! 	if (TEXTHIGH (pst) == 0 || valu > TEXTHIGH (pst))
! 	  TEXTHIGH (pst) = valu;
  	break;
        }
  #endif
***************
*** 610,616 ****
  	  }
  	if (pst && textlow_not_set)
  	  {
! 	    pst->textlow = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif
--- 610,616 ----
  	  }
  	if (pst && textlow_not_set)
  	  {
! 	    TEXTLOW (pst) = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif
***************
*** 626,637 ****
  	   the partial symbol table.  */
  	if (pst
              && (textlow_not_set
!                 || (CUR_SYMBOL_VALUE < pst->textlow
                      && (CUR_SYMBOL_VALUE
                          != ANOFFSET (objfile->section_offsets,
                                       SECT_OFF_TEXT (objfile))))))
  	  {
! 	    pst->textlow = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif /* DBXREAD_ONLY */
--- 626,637 ----
  	   the partial symbol table.  */
  	if (pst
              && (textlow_not_set
!                 || (CUR_SYMBOL_VALUE < TEXTLOW (pst)
                      && (CUR_SYMBOL_VALUE
                          != ANOFFSET (objfile->section_offsets,
                                       SECT_OFF_TEXT (objfile))))))
  	  {
! 	    TEXTLOW (pst) = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif /* DBXREAD_ONLY */
***************
*** 677,683 ****
  	  }
  	if (pst && textlow_not_set)
  	  {
! 	    pst->textlow = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif
--- 677,683 ----
  	  }
  	if (pst && textlow_not_set)
  	  {
! 	    TEXTLOW (pst) = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif
***************
*** 693,704 ****
  	   the partial symbol table.  */
  	if (pst
              && (textlow_not_set
!                 || (CUR_SYMBOL_VALUE < pst->textlow
                      && (CUR_SYMBOL_VALUE
                          != ANOFFSET (objfile->section_offsets,
                                       SECT_OFF_TEXT (objfile))))))
  	  {
! 	    pst->textlow = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif /* DBXREAD_ONLY */
--- 693,704 ----
  	   the partial symbol table.  */
  	if (pst
              && (textlow_not_set
!                 || (CUR_SYMBOL_VALUE < TEXTLOW (pst)
                      && (CUR_SYMBOL_VALUE
                          != ANOFFSET (objfile->section_offsets,
                                       SECT_OFF_TEXT (objfile))))))
  	  {
! 	    TEXTLOW (pst) = CUR_SYMBOL_VALUE;
  	    textlow_not_set = 0;
  	  }
  #endif /* DBXREAD_ONLY */
***************
*** 813,819 ****
    case N_ENDM:
  #ifdef SOFUN_ADDRESS_MAYBE_MISSING
      /* Solaris 2 end of module, finish current partial symbol table.
!        END_PSYMTAB will set pst->texthigh to the proper value, which
         is necessary if a module compiled without debugging info
         follows this module.  */
      if (pst)
--- 813,819 ----
    case N_ENDM:
  #ifdef SOFUN_ADDRESS_MAYBE_MISSING
      /* Solaris 2 end of module, finish current partial symbol table.
!        END_PSYMTAB will set TEXTHIGH (pst) to the proper value, which
         is necessary if a module compiled without debugging info
         follows this module.  */
      if (pst)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: partial symbol table address range generalization
  2001-10-23 16:33 RFC: partial symbol table address range generalization Jim Blandy
@ 2001-10-23 16:54 ` Daniel Jacobowitz
       [not found]   ` <npsnc965cl.fsf@zwingli.cygnus.com>
  2001-10-24  1:52 ` Eli Zaretskii
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel Jacobowitz @ 2001-10-23 16:54 UTC (permalink / raw)
  To: gdb-patches

On Tue, Oct 23, 2001 at 06:34:50PM -0500, Jim Blandy wrote:
> At the moment, GDB's partial symbol table structures assume that each
> compilation unit's text occupies a single contiguous range of
> addresses.  If a given address falls between a `struct partial_symtab'
> structure's `texthigh' and `textlow' addresses, then GDB assumes that
> reading that partial symbol table's debugging info will provide
> complete information about that address.
> 
> However, this assumption isn't true.  A given .o file's code can
> actually appear in any number of distinct regions, separated by code
> from other .o files.  This can happen when we compile and link a C++
> program, like the below (from GCC's test suite):

I'd like to point you at my thoughts from the first time I noticed this
behavior:
  < http://sources.redhat.com/ml/gdb/2001-08/msg00161.html >

When I looked at this problem in August, I wasn't brave enough to
suggest the solution you're proposing (which I completely agree with). 
However...

> There is some logic in GDB's lookup functions to cope with overlapping
> partial symtabs, and they've been working pretty well on our behalf.
> However, they're fragile, and do break in everyday use.  For example,
> in the executable produced from the source file above, if you try to
> set a breakpoint on a library routine compiled without debug
> information, GDB will set the breakpoint in `main' instead.  (On some
> platforms, `_exit' is such a function.)

This particular problem should be avoidable anyway.  I would appreciate
it if you would look at:
 < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html >

As far as getting this information from the stabs reader, we should be
able to do it if we know separate "possible" and "definite" ranges, I
think.  I'm not sure if we have enough information to do this.  We
don't for the general case, but we should generally have one "definite"
range per file (corresponding to the main .text segment).  It would be
nice to at least take this region out of any symtabs that seem to
encompass it, so that if the main program is built without
-ffunction-sections, we will behave sanely in the presence of (say)
libstdc++.a.


-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: partial symbol table address range generalization
       [not found]   ` <npsnc965cl.fsf@zwingli.cygnus.com>
@ 2001-10-23 21:42     ` Daniel Jacobowitz
  2001-10-24 17:08       ` Jim Blandy
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Jacobowitz @ 2001-10-23 21:42 UTC (permalink / raw)
  To: Jim Blandy; +Cc: gdb-patches

On Tue, Oct 23, 2001 at 11:15:38PM -0500, Jim Blandy wrote:
> 
> Daniel Jacobowitz <drow@mvista.com> writes:
> > I'd like to point you at my thoughts from the first time I noticed this
> > behavior:
> >   < http://sources.redhat.com/ml/gdb/2001-08/msg00161.html >
> 
> That's messed up.  The compiler produces SO stabs that describe only
> the .text section, while happily placing code in other sections.  So
> the SO stabs' addresses are useless.
> 
> If you want to hack on the stabs reader to handle these cases, that
> would be great.

I'm not sure how to, at present.  But it may be possible when you're
done.  Of course, I'd rather transition all the targets I care about to
DWARF2 instead of fixing stabs; but if I have time, I'll take a whack
at it.

> > > There is some logic in GDB's lookup functions to cope with overlapping
> > > partial symtabs, and they've been working pretty well on our behalf.
> > > However, they're fragile, and do break in everyday use.  For example,
> > > in the executable produced from the source file above, if you try to
> > > set a breakpoint on a library routine compiled without debug
> > > information, GDB will set the breakpoint in `main' instead.  (On some
> > > platforms, `_exit' is such a function.)
> > 
> > This particular problem should be avoidable anyway.  I would appreciate
> > it if you would look at:
> >  < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html >
> 
> Well, okay.  But I'd much prefer to see the problem fixed by making
> the symbol table contents more accurate than by adding another
> heuristic for recognizing insane data.  When checks like that get into
> the code, they never go away, because nobody's ever really sure
> whether the circumstances it was meant to cope with happen any more.

Well, I wouldn't call it a heuristic in this case.  If we don't have
debugging info, isn't it a little bit insane to try to find a line
number?

> If I can finish up the addrset patch for Dwarf 2, would you be
> interested in taking a shot at spiffing up stabs?

As said, it isn't a priority for me, but it is something that I would
like to see cleaned up.  Especially when Elena is finished cleaning up
the partial-stab.h mess (or has it been finished now?  I don't recall).

> I encourage you to take a shot at it.  The idea is to have `struct
> addrset' support a bunch of groovy operations (like set subtraction,
> testing for intersections between two sets, etc.), carefully coded and
> gotten right once and for all, that make it easy to do this kind of
> sanity checking and refinement.  Here's the header file for the code
> I've got now; imagine adding the setwise ops you want here, and then
> using them in the stabs reader.

Looks good!  The only comment I have is trivial:

> /* Set *START and *END to the first and last addresses (inclusive) of
>    the first contiguous range of addresses in ADDRSET.  If ADDRSET is
>    empty, set *START to 1 and *END to zero.  */
> void addrset_first_range (struct addrset *addrset,
>                           CORE_ADDR *start,
>                           CORE_ADDR *end);
> 
> /* Set *START and *END to the first and last addresses (inclusive) of
>    the first contiguous range of addresses in ADDRSET after AFTER.
>    (That is, AFTER will not be included in the returned range.)
>    If there are no addresses in ADDRSET that are > AFTER, then
>    set *START to 1 and *END to zero.  */
> void addrset_next_range (struct addrset *addrset,
>                          CORE_ADDR after,
>                          CORE_ADDR *start,
>                          CORE_ADDR *end);

For performance reasons, it might be better to steal the iterator
concept and have an opaque cookie separate from the start/end
addresses.  Otherwise, addrset_next_range is not going to be constant
time, and that could get annoying in an objfile with a large number of
sections... and C++ is notorious for absurd numbers of sections.

-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: partial symbol table address range generalization
  2001-10-23 16:33 RFC: partial symbol table address range generalization Jim Blandy
  2001-10-23 16:54 ` Daniel Jacobowitz
@ 2001-10-24  1:52 ` Eli Zaretskii
  1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2001-10-24  1:52 UTC (permalink / raw)
  To: Jim Blandy; +Cc: gdb-patches

On Tue, 23 Oct 2001, Jim Blandy wrote:

> I'd like to change GDB's `struct partial_symtab' structure to cover an
> arbitrary set of addresses, not just a single contiguous span of
> addresses.  I've written code for a `struct addrset' datatype
> (essentially a linked list of ranges) and provided some functions to
> make it easy and robust to work with.  I'd like to gradually wean the
> partial symbol table code away from the textlow/texthigh
> representation, and have it use an addrset to represent a partial
> symbol table's coverage.

Please consider writing something about this for gdbint.texinfo,
eventually.  The GDB symbol table facilities and algorithms are
notoriously underdocumented at the moment, IMHO.

TIA


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: partial symbol table address range generalization
  2001-10-23 21:42     ` Daniel Jacobowitz
@ 2001-10-24 17:08       ` Jim Blandy
  2001-10-24 17:39         ` Daniel Jacobowitz
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Blandy @ 2001-10-24 17:08 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz <drow@mvista.com> writes:
> > > This particular problem should be avoidable anyway.  I would appreciate
> > > it if you would look at:
> > >  < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html >
> > 
> > Well, okay.  But I'd much prefer to see the problem fixed by making
> > the symbol table contents more accurate than by adding another
> > heuristic for recognizing insane data.  When checks like that get into
> > the code, they never go away, because nobody's ever really sure
> > whether the circumstances it was meant to cope with happen any more.
> 
> Well, I wouldn't call it a heuristic in this case.  If we don't have
> debugging info, isn't it a little bit insane to try to find a line
> number?

Yes, certainly.  But the reasoning I see going on is this: we can't
trust our address range data, and our source line search screws up if
the address range data isn't accurate, so we'll prop up the process by
pulling a name search into the process, too.  The more checks we add,
the more likely we are to detect that something's screwey.

This works, but I'd really rather just make the address range data
accurate in the first place.  The part that strikes me as a
`heuristic' is the assumption that the address ranges are not quite
trustworthy, and that therefore we should cross-check them against the
function table before we proceed.

If you think it's going to be too hard to get reliable address ranges
for symtabs, then it'd certainly be better to put Peter's change in;
at least something will get fixed.  But if it's practical, I think
it's better policy to work hard at filling your data structures with
reliable data than to sprinkle your query code with sanity checks.

> For performance reasons, it might be better to steal the iterator
> concept and have an opaque cookie separate from the start/end
> addresses.  Otherwise, addrset_next_range is not going to be constant
> time, and that could get annoying in an objfile with a large number of
> sections... and C++ is notorious for absurd numbers of sections.

It is (usually) constant time.  :)  The representation maintains a
`current position' in the sorted linked list of ranges, and starts its
searches from that point.  So in a series of operations where each
address is near the last one (like a sequential scan), each operation
takes place in constant time.

This is somewhat like having a single iterator object hidden inside
the addrset object; that is, at any given time, there's only one range
of addresses which can be operated on quickly.  If instead we had an
explicit iterator type, one could do multiple traversals of different
areas efficiently.  However, mutations can delete nodes from the list,
and change their values; it seems to me that keeping iterators valid
across mutation operations would add complexity.

Anyway, it's an opaque type.  If performance is a problem, or if we
want to add iterators, we can revamp the whole thing without breaking
its users.

Here's the implementation:


/* addrset.c --- implementation of the address set datatype  
   Copyright 2001 Free Software Foundation, Inc.

   This file is part of GDB.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include "defs.h"
#include "obstack.h"
#include "gdb_assert.h"
#include "addrset.h"

struct addrset {

  /* Two singly-linked lists of nodes; each node represents a range of
     addresses in the addrset.  None of these nodes' address ranges
     overlap or abut each other.  `before' is sorted in order of
     decreasing addresses, `after' in order of increasing addresses.
     All nodes in `before' have lower addresses than all the nodes in
     `after'.

     Generally, whenever we operate on the list, we try split it
     between `before' and `after' so that the division comes directly
     after the last node we operated on.  That way, sequential access
     happens in constant time.  Since we split it *after* the last
     node we operate on, we're slightly biased towards traversal
     towards increasing address, but it's not too bad if we want to go
     the other way.  */
  struct node *before, *after;

  /* If this is non-zero, allocate new nodes from this obstack.
     Otherwise, allocate nodes using xmalloc.  */
  struct obstack *obstack;

  /* We can't free nodes allocated from an obstack, so we might as
     well keep them in a free list, and reuse them.  This list can't
     be global, as different obstacks have different lifetimes.
     Ideally, it would be per-obstack, so a node could be freed from
     one addrset and then re-used by another addrset in the same
     obstack.  But we don't have any convenient way to store
     per-obstack info.  So the free list is per-addrset.

     For addrsets allocated using xmalloc, we simply xfree nodes, and
     hope malloc manages small blocks well; this field is always zero.  */
  struct node *free;

};

/* One contiguous range of addresses in an addrset.  */
struct node {
  struct node *next;

  /* The start and end addresses, inclusive, of the address range.
     Obviously, start <= end.  If start == end, that's a one-byte
     range.  */
  CORE_ADDR start, end;       
};


struct addrset *
new_addrset (struct obstack *obstack)
{
  struct addrset *new;

  if (obstack)
    new = obstack_alloc (obstack, sizeof (*new));
  else
    new = xmalloc (sizeof (*new));

  memset (new, 0, sizeof (*new));

  new->obstack = obstack;

  return new;
}


static void
xfree_node_list (struct node *n)
{
  while (n)
    {
      struct node *next = n->next;

      xfree (n);
      n = next;
    }
}


void
addrset_free (struct addrset *addrset)
{
  /* You can't free an addrset allocated in an obstack.  */
  gdb_assert (! addrset->obstack);

  /* If it wasn't allocated on an obstack, it had better not have a
     free list.  */
  gdb_assert (! addrset->free);

  xfree_node_list (addrset->before);
  xfree_node_list (addrset->after);
}


static struct node *
new_node (struct addrset *addrset,
          CORE_ADDR start, CORE_ADDR end)
{
  struct node *new;

  if (addrset->obstack)
    {
      /* If we have any nodes on the free list, use them.  */
      if (addrset->free)
        {
          new = addrset->free;
          addrset->free = new->next;
        }
      else
        new = obstack_alloc (addrset->obstack, sizeof (*new));
    }
  else
    new = xmalloc (sizeof (*new));

  new->next = 0;
  new->start = start;
  new->end = end;

  return new;
}


static void
free_node (struct addrset *addrset, struct node *node)
{
  if (addrset->obstack)
    {
      node->next = addrset->free;
      addrset->free = node;
    }
  else
    xfree (node);
}


/* Return true if A is less than B, and there is at least one
   address between them.  */
static int
less_and_separate (CORE_ADDR a, CORE_ADDR b)
{
  /* We need both tests; think about what happens, for example, when
     B is zero and A is (CORE_ADDR) -1.  */ 
  return (a < b && b - a >= 2);
}


/* Shift nodes from `before' to `after', or in the other direction,
   until the split falls at the right place.  By `the right place', we
   mean:
   - any nodes on `after' are after, and do not abut, the range from
     START to END, and
   - any nodes on `before' are either completely before, abut, or overlap
     the range from START to END.
   Once this is done, any nodes on `after' are irrelevant to our
   operation, and any nodes that abut or overlap our range are at the
   head of `before'.  */
static void
resplit (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
  struct node *node;

  /* We know the `before' and `after' lists are each sorted in the
     proper direction, and contain non-overlapping, non-adjacent
     ranges.

     The first step is to move all ranges from `after' to `before'
     that are before, overlap, or abut the range we're operating on.
     Be careful of nodes abutting either end of the address space.  */
  while ((node = addrset->after) != 0
         && ! less_and_separate (end, node->start))
    {
      addrset->after = node->next;
      node->next = addrset->before;
      addrset->before = node;
    }

  /* Now, any ranges in `after' are after and do not abut the range
     we're operating on.  This means that all those nodes are
     irrelevant to this operation, and we can ignore them.

     Thus, any nodes we care about are somewhere in `before'.  Bring
     them to the head of that list by shifting any nodes from `before'
     to `after' that are completely after the range we're operating
     on.

     The astute reader will note that the condition for moving a node
     in this loop is the complement of that for the previous loop.
     That is, the condition for moving a node in one direction is the
     opposite of moving it in the other direction.  This means that
     only one of these two loops' bodies will ever be executed in a
     given call.  */
  while ((node = addrset->before) != 0
         && less_and_separate (end, node->start))
    {
      addrset->before = node->next;
      node->next = addrset->after;
      addrset->after = node;
    }
}


void
addrset_add (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
  struct node *node, *save;

  gdb_assert (start <= end);

  resplit (addrset, start, end);

  /* Now we know that any ranges in `after' are irrelevant to this
     operation, and that zero or more nodes at the head of `before'
     abut or overlap the range we're adding.  Remove any such nodes
     from `before', and absorb their ranges into ours.  */
  save = 0;
  while ((node = addrset->before) != 0
         && ! less_and_separate (node->end, start))
    {
      if (node->start < start)
        start = node->start;
      if (node->end > end)
        end = node->end;
      addrset->before = node->next;

      /* We want to remove this node from the list, but we know we're
         going to insert a new one in a bit, so hold onto the first
         node we delete, just to save a call to xfree and xmalloc.  */
      if (save)
        free_node (addrset, node);
      else
        save = node;
    }

  /* Create a node for the new range.  */
  if (save)
    {
      save->start = start;
      save->end = end;
    }
  else
    save = new_node (addrset, start, end);

  /* Stick it on the head of the `before' list.  */
  save->next = addrset->before;
  addrset->before = save;
}


void
addrset_remove (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
{
  struct node *node;

  gdb_assert (start <= end);

  /* This could end up moving a bunch of nodes to `before' that we're
     just going to delete anyway.  But our running time is linear in
     the number of ranges we're going to delete anyway, and this lets
     us deal with the `before' list only; otherwise, we'd have to
     write out the code twice, once for each list.  */
  resplit (addrset, start, end);

  /* Now we know that any ranges in `after' are irrelevant to this
     operation, and that zero or more nodes at the head of `before'
     abut or overlap the range we're adding.

     If the node at the head of the `before' list extends after our
     range, either truncate it and shift it to `after', or split it in
     two.  */
  if ((node = addrset->before) != 0
      && end < node->end)
    {
      /* If it also extends before the range we're deleting, we need
         to split it into two nodes: one for the part before and one
         for the part after.  And we know we don't need to do anything
         else to the `before' list.  */
      if (node->start < start)
        {
          struct node *right_part = new_node (addrset, end + 1, node->end);
          right_part->next = addrset->after;
          addrset->after = right_part;
          node->end = start - 1;
          return;
        }

      /* Just shift the node onto `after', and adjust its size.  */
      addrset->before = node->next;
      node->next = addrset->after;
      addrset->after = node;
      node->start = end + 1;

      /* There may be more nodes on `before' that we need to adjust.  */
    }

  /* Now we know that the head of `before' doesn't extend beyond the
     end of the range we're deleting.  Nothing will need to be added
     to `after'.

     Remove any nodes completely contained in the region we're
     deleting.  */
  while ((node = addrset->before) != 0
         && start <= node->start)
    {
      addrset->before = node->next;
      free_node (addrset, node);
    }

  /* The head of `before' (if any) isn't completely contained in the
     range we're deleting, but it may partially overlap it at the low
     end.  If that's the case, adjust its endpoint.  */
  if ((node = addrset->before) != 0
      && start <= node->end)
    node->end = start - 1;
}


/* Shift *ADDR by OFFSET, and check that the direction of change from
   OLD to NEW is consistent with *DIR.  If it isn't, raise an internal
   error.  OFFSET must be non-zero.  See offset_node_list for the
   meaning of *DIR.  */
static void
shift_addr (CORE_ADDR *addr, CORE_ADDR offset, int *dir)
{
  CORE_ADDR old = *addr;
  CORE_ADDR new = (*addr += offset);

  if (new < old)
    {
      if (! *dir)
        *dir = -1;
      else
        gdb_assert (*dir == -1);
    }
  else
    {
      if (! *dir)
        *dir = 1;
      else
        gdb_assert (*dir == 1);
    }
}


/* Offset addresses in NODE and its successors by OFFSET.
   OFFSET must be non-zero.
   *DIR is the direction of the shift, used for error checking.
   - If *DIR is zero, then we don't know which direction things will be
     shifted yet.
   - If *DIR is -1 or +1, then OFFSET shifted other addresses in other nodes
     downwards or upwards, and we should raise an internal_error if any
     of our addresses go in the other direction.  */
static void
offset_node_list (struct node *node, CORE_ADDR offset, int *dir)
{
  while (node)
    {
      shift_addr (&node->start, offset, dir);
      shift_addr (&node->end, offset, dir);

      node = node->next;
    }
}


void
addrset_offset (struct addrset *addrset, CORE_ADDR offset)
{
  int dir = 0;

  if (offset == 0)
    return;

  offset_node_list (addrset->before, offset, &dir);
  offset_node_list (addrset->after, offset, &dir);
}


int
addrset_in (struct addrset *addrset, CORE_ADDR addr)
{
  struct node *node;

  resplit (addrset, addr, addr);

  /* Now any node containing addr would have to be on `before'.  */
  return ((node = addrset->before) != 0
          && node->start <= addr && addr <= node->end);
}


int
addrset_span (struct addrset *addrset,
              CORE_ADDR addr,
              CORE_ADDR *start,
              CORE_ADDR *end)
{
  struct node *node;

  resplit (addrset, addr, addr);

  /* Now any node containing addr would have to be on `before'.  */
  if ((node = addrset->before) != 0
      && node->start <= addr && addr <= node->end)
    {
      *start = node->start;
      *end = node->end;
      return 1;
    }
  else
    return 0;
}


void
addrset_first_range (struct addrset *addrset,
                     CORE_ADDR *start,
                     CORE_ADDR *end)
{
  struct node *node;
  
  resplit (addrset, 0, 0);
  
  /* If there's anything on `before', that's our first range.
     Otherwise, if there's anything on after, that's it.  */
  if ((node = addrset->before) != 0
      || (node = addrset->after) != 0)
    {
      *start = node->start;
      *end = node->end;
    }
  else
    {
      /* ADDRSET is empty.  */
      *start = 1;
      *end = 0;
    }
}


void
addrset_next_range (struct addrset *addrset,
                    CORE_ADDR after,
                    CORE_ADDR *start,
                    CORE_ADDR *end)
{
  struct node *node;

  resplit (addrset, after, after);

  /* The first node on `before' might extend beyond our range.  */
  if ((node = addrset->before) != 0
      && after < node->end)
    {
      *start = after + 1;
      *end = node->end;
    }

  /* Otherwise, the first node on `after' is the one we want.  */
  else if ((node = addrset->after) != 0)
    {
      *start = node->start;
      *end = node->end;
    }

  /* Otherwise, there are no more addresses in ADDRSET after AFTER.  */
  else
    {
      *start = 1;
      *end = 0;
    }
}


void
addrset_print (struct addrset *addrset, struct ui_file *outfile)
{
  CORE_ADDR start, end;
  int first = 1;

  for (addrset_first_range (addrset, &start, &end);
       start <= end;
       addrset_next_range (addrset, end, &start, &end))
    {
      if (! first)
        fprintf_filtered (outfile, ", ");
      print_address_numeric (start, 1, outfile);
      fprintf_filtered (outfile, "-");
      print_address_numeric (end, 1, outfile);
      first = 0;
    }
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: partial symbol table address range generalization
  2001-10-24 17:08       ` Jim Blandy
@ 2001-10-24 17:39         ` Daniel Jacobowitz
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Jacobowitz @ 2001-10-24 17:39 UTC (permalink / raw)
  To: Jim Blandy; +Cc: gdb-patches

On Wed, Oct 24, 2001 at 07:09:25PM -0500, Jim Blandy wrote:
> 
> Daniel Jacobowitz <drow@mvista.com> writes:
> > > > This particular problem should be avoidable anyway.  I would appreciate
> > > > it if you would look at:
> > > >  < http://sources.redhat.com/ml/gdb/2001-09/msg00068.html >
> > > 
> > > Well, okay.  But I'd much prefer to see the problem fixed by making
> > > the symbol table contents more accurate than by adding another
> > > heuristic for recognizing insane data.  When checks like that get into
> > > the code, they never go away, because nobody's ever really sure
> > > whether the circumstances it was meant to cope with happen any more.
> > 
> > Well, I wouldn't call it a heuristic in this case.  If we don't have
> > debugging info, isn't it a little bit insane to try to find a line
> > number?
> 
> Yes, certainly.  But the reasoning I see going on is this: we can't
> trust our address range data, and our source line search screws up if
> the address range data isn't accurate, so we'll prop up the process by
> pulling a name search into the process, too.  The more checks we add,
> the more likely we are to detect that something's screwey.
> 
> This works, but I'd really rather just make the address range data
> accurate in the first place.  The part that strikes me as a
> `heuristic' is the assumption that the address ranges are not quite
> trustworthy, and that therefore we should cross-check them against the
> function table before we proceed.
> 
> If you think it's going to be too hard to get reliable address ranges
> for symtabs, then it'd certainly be better to put Peter's change in;
> at least something will get fixed.  But if it's practical, I think
> it's better policy to work hard at filling your data structures with
> reliable data than to sprinkle your query code with sanity checks.

Well, for at least STABS we're never going to get reliable address
ranges.  The data we need is not there.  Any inference will be
guesswork at best.

But that isn't really my point.  What the code in question
(find_pc_sect_line) does is:

/* Find the source file and line number for a given PC value and SECTION.
   Return a structure containing a symtab pointer, a line number,
   and a pc range for the entire source line.

If we don't know what function we're in, then it isn't just that we can
not trust the address range data; we know absolutely that we will not
find a valid line number.  I'm pretty sure that find_pc_function can
fail in reasonable ways; in trampolines, for instance.  It may well be
that we won't reach find_pc_sect_line in any of those cases, ideally,
but I'm not convinced of that.

I'd prefer to both work hard to fill the symtabs with reliable data and
also sprinkle sanity checks where they clearly seem to be needed.

> > For performance reasons, it might be better to steal the iterator
> > concept and have an opaque cookie separate from the start/end
> > addresses.  Otherwise, addrset_next_range is not going to be constant
> > time, and that could get annoying in an objfile with a large number of
> > sections... and C++ is notorious for absurd numbers of sections.
> 
> It is (usually) constant time.  :)  The representation maintains a
> `current position' in the sorted linked list of ranges, and starts its
> searches from that point.  So in a series of operations where each
> address is near the last one (like a sequential scan), each operation
> takes place in constant time.
> 
> This is somewhat like having a single iterator object hidden inside
> the addrset object; that is, at any given time, there's only one range
> of addresses which can be operated on quickly.  If instead we had an
> explicit iterator type, one could do multiple traversals of different
> areas efficiently.  However, mutations can delete nodes from the list,
> and change their values; it seems to me that keeping iterators valid
> across mutation operations would add complexity.
> 
> Anyway, it's an opaque type.  If performance is a problem, or if we
> want to add iterators, we can revamp the whole thing without breaking
> its users.
> 
> Here's the implementation:
> 
> 
> /* addrset.c --- implementation of the address set datatype  
>    Copyright 2001 Free Software Foundation, Inc.
> 
>    This file is part of GDB.
> 
>    This program is free software; you can redistribute it and/or modify
>    it under the terms of the GNU General Public License as published by
>    the Free Software Foundation; either version 2 of the License, or
>    (at your option) any later version.
> 
>    This program is distributed in the hope that it will be useful,
>    but WITHOUT ANY WARRANTY; without even the implied warranty of
>    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>    GNU General Public License for more details.
> 
>    You should have received a copy of the GNU General Public License
>    along with this program; if not, write to the Free Software
>    Foundation, Inc., 59 Temple Place - Suite 330,
>    Boston, MA 02111-1307, USA.  */
> 
> #include "defs.h"
> #include "obstack.h"
> #include "gdb_assert.h"
> #include "addrset.h"
> 
> struct addrset {
> 
>   /* Two singly-linked lists of nodes; each node represents a range of
>      addresses in the addrset.  None of these nodes' address ranges
>      overlap or abut each other.  `before' is sorted in order of
>      decreasing addresses, `after' in order of increasing addresses.
>      All nodes in `before' have lower addresses than all the nodes in
>      `after'.
> 
>      Generally, whenever we operate on the list, we try split it
>      between `before' and `after' so that the division comes directly
>      after the last node we operated on.  That way, sequential access
>      happens in constant time.  Since we split it *after* the last
>      node we operate on, we're slightly biased towards traversal
>      towards increasing address, but it's not too bad if we want to go
>      the other way.  */
>   struct node *before, *after;
> 
>   /* If this is non-zero, allocate new nodes from this obstack.
>      Otherwise, allocate nodes using xmalloc.  */
>   struct obstack *obstack;
> 
>   /* We can't free nodes allocated from an obstack, so we might as
>      well keep them in a free list, and reuse them.  This list can't
>      be global, as different obstacks have different lifetimes.
>      Ideally, it would be per-obstack, so a node could be freed from
>      one addrset and then re-used by another addrset in the same
>      obstack.  But we don't have any convenient way to store
>      per-obstack info.  So the free list is per-addrset.
> 
>      For addrsets allocated using xmalloc, we simply xfree nodes, and
>      hope malloc manages small blocks well; this field is always zero.  */
>   struct node *free;
> 
> };
> 
> /* One contiguous range of addresses in an addrset.  */
> struct node {
>   struct node *next;
> 
>   /* The start and end addresses, inclusive, of the address range.
>      Obviously, start <= end.  If start == end, that's a one-byte
>      range.  */
>   CORE_ADDR start, end;       
> };
> 
> 
> struct addrset *
> new_addrset (struct obstack *obstack)
> {
>   struct addrset *new;
> 
>   if (obstack)
>     new = obstack_alloc (obstack, sizeof (*new));
>   else
>     new = xmalloc (sizeof (*new));
> 
>   memset (new, 0, sizeof (*new));
> 
>   new->obstack = obstack;
> 
>   return new;
> }
> 
> 
> static void
> xfree_node_list (struct node *n)
> {
>   while (n)
>     {
>       struct node *next = n->next;
> 
>       xfree (n);
>       n = next;
>     }
> }
> 
> 
> void
> addrset_free (struct addrset *addrset)
> {
>   /* You can't free an addrset allocated in an obstack.  */
>   gdb_assert (! addrset->obstack);
> 
>   /* If it wasn't allocated on an obstack, it had better not have a
>      free list.  */
>   gdb_assert (! addrset->free);
> 
>   xfree_node_list (addrset->before);
>   xfree_node_list (addrset->after);
> }
> 
> 
> static struct node *
> new_node (struct addrset *addrset,
>           CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *new;
> 
>   if (addrset->obstack)
>     {
>       /* If we have any nodes on the free list, use them.  */
>       if (addrset->free)
>         {
>           new = addrset->free;
>           addrset->free = new->next;
>         }
>       else
>         new = obstack_alloc (addrset->obstack, sizeof (*new));
>     }
>   else
>     new = xmalloc (sizeof (*new));
> 
>   new->next = 0;
>   new->start = start;
>   new->end = end;
> 
>   return new;
> }
> 
> 
> static void
> free_node (struct addrset *addrset, struct node *node)
> {
>   if (addrset->obstack)
>     {
>       node->next = addrset->free;
>       addrset->free = node;
>     }
>   else
>     xfree (node);
> }
> 
> 
> /* Return true if A is less than B, and there is at least one
>    address between them.  */
> static int
> less_and_separate (CORE_ADDR a, CORE_ADDR b)
> {
>   /* We need both tests; think about what happens, for example, when
>      B is zero and A is (CORE_ADDR) -1.  */ 
>   return (a < b && b - a >= 2);
> }
> 
> 
> /* Shift nodes from `before' to `after', or in the other direction,
>    until the split falls at the right place.  By `the right place', we
>    mean:
>    - any nodes on `after' are after, and do not abut, the range from
>      START to END, and
>    - any nodes on `before' are either completely before, abut, or overlap
>      the range from START to END.
>    Once this is done, any nodes on `after' are irrelevant to our
>    operation, and any nodes that abut or overlap our range are at the
>    head of `before'.  */
> static void
> resplit (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *node;
> 
>   /* We know the `before' and `after' lists are each sorted in the
>      proper direction, and contain non-overlapping, non-adjacent
>      ranges.
> 
>      The first step is to move all ranges from `after' to `before'
>      that are before, overlap, or abut the range we're operating on.
>      Be careful of nodes abutting either end of the address space.  */
>   while ((node = addrset->after) != 0
>          && ! less_and_separate (end, node->start))
>     {
>       addrset->after = node->next;
>       node->next = addrset->before;
>       addrset->before = node;
>     }
> 
>   /* Now, any ranges in `after' are after and do not abut the range
>      we're operating on.  This means that all those nodes are
>      irrelevant to this operation, and we can ignore them.
> 
>      Thus, any nodes we care about are somewhere in `before'.  Bring
>      them to the head of that list by shifting any nodes from `before'
>      to `after' that are completely after the range we're operating
>      on.
> 
>      The astute reader will note that the condition for moving a node
>      in this loop is the complement of that for the previous loop.
>      That is, the condition for moving a node in one direction is the
>      opposite of moving it in the other direction.  This means that
>      only one of these two loops' bodies will ever be executed in a
>      given call.  */
>   while ((node = addrset->before) != 0
>          && less_and_separate (end, node->start))
>     {
>       addrset->before = node->next;
>       node->next = addrset->after;
>       addrset->after = node;
>     }
> }
> 
> 
> void
> addrset_add (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *node, *save;
> 
>   gdb_assert (start <= end);
> 
>   resplit (addrset, start, end);
> 
>   /* Now we know that any ranges in `after' are irrelevant to this
>      operation, and that zero or more nodes at the head of `before'
>      abut or overlap the range we're adding.  Remove any such nodes
>      from `before', and absorb their ranges into ours.  */
>   save = 0;
>   while ((node = addrset->before) != 0
>          && ! less_and_separate (node->end, start))
>     {
>       if (node->start < start)
>         start = node->start;
>       if (node->end > end)
>         end = node->end;
>       addrset->before = node->next;
> 
>       /* We want to remove this node from the list, but we know we're
>          going to insert a new one in a bit, so hold onto the first
>          node we delete, just to save a call to xfree and xmalloc.  */
>       if (save)
>         free_node (addrset, node);
>       else
>         save = node;
>     }
> 
>   /* Create a node for the new range.  */
>   if (save)
>     {
>       save->start = start;
>       save->end = end;
>     }
>   else
>     save = new_node (addrset, start, end);
> 
>   /* Stick it on the head of the `before' list.  */
>   save->next = addrset->before;
>   addrset->before = save;
> }
> 
> 
> void
> addrset_remove (struct addrset *addrset, CORE_ADDR start, CORE_ADDR end)
> {
>   struct node *node;
> 
>   gdb_assert (start <= end);
> 
>   /* This could end up moving a bunch of nodes to `before' that we're
>      just going to delete anyway.  But our running time is linear in
>      the number of ranges we're going to delete anyway, and this lets
>      us deal with the `before' list only; otherwise, we'd have to
>      write out the code twice, once for each list.  */
>   resplit (addrset, start, end);
> 
>   /* Now we know that any ranges in `after' are irrelevant to this
>      operation, and that zero or more nodes at the head of `before'
>      abut or overlap the range we're adding.
> 
>      If the node at the head of the `before' list extends after our
>      range, either truncate it and shift it to `after', or split it in
>      two.  */
>   if ((node = addrset->before) != 0
>       && end < node->end)
>     {
>       /* If it also extends before the range we're deleting, we need
>          to split it into two nodes: one for the part before and one
>          for the part after.  And we know we don't need to do anything
>          else to the `before' list.  */
>       if (node->start < start)
>         {
>           struct node *right_part = new_node (addrset, end + 1, node->end);
>           right_part->next = addrset->after;
>           addrset->after = right_part;
>           node->end = start - 1;
>           return;
>         }
> 
>       /* Just shift the node onto `after', and adjust its size.  */
>       addrset->before = node->next;
>       node->next = addrset->after;
>       addrset->after = node;
>       node->start = end + 1;
> 
>       /* There may be more nodes on `before' that we need to adjust.  */
>     }
> 
>   /* Now we know that the head of `before' doesn't extend beyond the
>      end of the range we're deleting.  Nothing will need to be added
>      to `after'.
> 
>      Remove any nodes completely contained in the region we're
>      deleting.  */
>   while ((node = addrset->before) != 0
>          && start <= node->start)
>     {
>       addrset->before = node->next;
>       free_node (addrset, node);
>     }
> 
>   /* The head of `before' (if any) isn't completely contained in the
>      range we're deleting, but it may partially overlap it at the low
>      end.  If that's the case, adjust its endpoint.  */
>   if ((node = addrset->before) != 0
>       && start <= node->end)
>     node->end = start - 1;
> }
> 
> 
> /* Shift *ADDR by OFFSET, and check that the direction of change from
>    OLD to NEW is consistent with *DIR.  If it isn't, raise an internal
>    error.  OFFSET must be non-zero.  See offset_node_list for the
>    meaning of *DIR.  */
> static void
> shift_addr (CORE_ADDR *addr, CORE_ADDR offset, int *dir)
> {
>   CORE_ADDR old = *addr;
>   CORE_ADDR new = (*addr += offset);
> 
>   if (new < old)
>     {
>       if (! *dir)
>         *dir = -1;
>       else
>         gdb_assert (*dir == -1);
>     }
>   else
>     {
>       if (! *dir)
>         *dir = 1;
>       else
>         gdb_assert (*dir == 1);
>     }
> }
> 
> 
> /* Offset addresses in NODE and its successors by OFFSET.
>    OFFSET must be non-zero.
>    *DIR is the direction of the shift, used for error checking.
>    - If *DIR is zero, then we don't know which direction things will be
>      shifted yet.
>    - If *DIR is -1 or +1, then OFFSET shifted other addresses in other nodes
>      downwards or upwards, and we should raise an internal_error if any
>      of our addresses go in the other direction.  */
> static void
> offset_node_list (struct node *node, CORE_ADDR offset, int *dir)
> {
>   while (node)
>     {
>       shift_addr (&node->start, offset, dir);
>       shift_addr (&node->end, offset, dir);
> 
>       node = node->next;
>     }
> }
> 
> 
> void
> addrset_offset (struct addrset *addrset, CORE_ADDR offset)
> {
>   int dir = 0;
> 
>   if (offset == 0)
>     return;
> 
>   offset_node_list (addrset->before, offset, &dir);
>   offset_node_list (addrset->after, offset, &dir);
> }
> 
> 
> int
> addrset_in (struct addrset *addrset, CORE_ADDR addr)
> {
>   struct node *node;
> 
>   resplit (addrset, addr, addr);
> 
>   /* Now any node containing addr would have to be on `before'.  */
>   return ((node = addrset->before) != 0
>           && node->start <= addr && addr <= node->end);
> }
> 
> 
> int
> addrset_span (struct addrset *addrset,
>               CORE_ADDR addr,
>               CORE_ADDR *start,
>               CORE_ADDR *end)
> {
>   struct node *node;
> 
>   resplit (addrset, addr, addr);
> 
>   /* Now any node containing addr would have to be on `before'.  */
>   if ((node = addrset->before) != 0
>       && node->start <= addr && addr <= node->end)
>     {
>       *start = node->start;
>       *end = node->end;
>       return 1;
>     }
>   else
>     return 0;
> }
> 
> 
> void
> addrset_first_range (struct addrset *addrset,
>                      CORE_ADDR *start,
>                      CORE_ADDR *end)
> {
>   struct node *node;
>   
>   resplit (addrset, 0, 0);
>   
>   /* If there's anything on `before', that's our first range.
>      Otherwise, if there's anything on after, that's it.  */
>   if ((node = addrset->before) != 0
>       || (node = addrset->after) != 0)
>     {
>       *start = node->start;
>       *end = node->end;
>     }
>   else
>     {
>       /* ADDRSET is empty.  */
>       *start = 1;
>       *end = 0;
>     }
> }
> 
> 
> void
> addrset_next_range (struct addrset *addrset,
>                     CORE_ADDR after,
>                     CORE_ADDR *start,
>                     CORE_ADDR *end)
> {
>   struct node *node;
> 
>   resplit (addrset, after, after);
> 
>   /* The first node on `before' might extend beyond our range.  */
>   if ((node = addrset->before) != 0
>       && after < node->end)
>     {
>       *start = after + 1;
>       *end = node->end;
>     }
> 
>   /* Otherwise, the first node on `after' is the one we want.  */
>   else if ((node = addrset->after) != 0)
>     {
>       *start = node->start;
>       *end = node->end;
>     }
> 
>   /* Otherwise, there are no more addresses in ADDRSET after AFTER.  */
>   else
>     {
>       *start = 1;
>       *end = 0;
>     }
> }
> 
> 
> void
> addrset_print (struct addrset *addrset, struct ui_file *outfile)
> {
>   CORE_ADDR start, end;
>   int first = 1;
> 
>   for (addrset_first_range (addrset, &start, &end);
>        start <= end;
>        addrset_next_range (addrset, end, &start, &end))
>     {
>       if (! first)
>         fprintf_filtered (outfile, ", ");
>       print_address_numeric (start, 1, outfile);
>       fprintf_filtered (outfile, "-");
>       print_address_numeric (end, 1, outfile);
>       first = 0;
>     }
> }
> 

-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-10-24 17:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-23 16:33 RFC: partial symbol table address range generalization Jim Blandy
2001-10-23 16:54 ` Daniel Jacobowitz
     [not found]   ` <npsnc965cl.fsf@zwingli.cygnus.com>
2001-10-23 21:42     ` Daniel Jacobowitz
2001-10-24 17:08       ` Jim Blandy
2001-10-24 17:39         ` Daniel Jacobowitz
2001-10-24  1:52 ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox