aborted thread backtrace stops at sighandler call

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* aborted thread backtrace stops at sighandler call
@ 2005-06-24 15:11 Louis LeBlanc
  2005-06-24 18:29 ` Daniel Jacobowitz
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Louis LeBlanc @ 2005-06-24 15:11 UTC (permalink / raw)
  To: gdb

Hey everyone.

I've got an app that seems ok under some pretty heavy load, but once
in a great while, it blows up during some network related operation,
particularly host name lookups.  I'm having similar problems with
other apps (even perl scripts) on the same OS (Solaris 8 & 9).

Well, often these were bus errors and gdb just couldn't nail things
down for me.  Finally, I decided to catch these signals (SIGBUS,
SIGSEGV) and collect what info I could before calling abort() - which
preserves the stack pretty well according to pstack.  When a problem
arises, I can look at the pstack output and see quite clearly that the
problem is this screwy network glitch I've never been able to track
down.  Problem is when it's something else, gdb doesn't seem to be
able to see the preserved stack.

Anyone have any idea how to get this?

This is the backtrace for the aborted thread:
(gdb) bt
#0  0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
#1  0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
#2  0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
#3  0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
#4  0xff365b14 in ?? ()
#5  0xff365b18 in ?? ()
Previous frame identical to this frame (corrupt stack?)

But pstack shows this:
-----------------  lwp# 3 / thread# 3  --------------------
 fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
 fe9b6cd8 abort    (df708, 0, 0, 0, 0, 0) + 100
 0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
 ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
 ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
 ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100, 0) + 64
 --- called from signal handler with signal 11 (SIGSEGV) ---
 fe9b44e4 strlen   (fe7778c0, 0, fe777850, 0, 0, 0) + 80
 fea08c98 vsnprintf (fe7784c0, c00, fe7778c0, fe779118, 7300, fe7778c0) + 5c
 000d3390 ERROR    (dffa8, 0, 7530, 1, 81010100, 3d740) + 48
 0003d590 make_ssl_connection (fe779368, a060164, 0, fd043b8, 7530, fe77bed0) + 57c
 000300c0 handle_check (10e800, dc290, ffffd438, 1, 0, 5b7550) + 1c08
 000d7bbc spawn    (fcaa2b0, 0, 0, 0, 0, 0) + 20
 ff3657b4 _lwp_start (0, 0, 0, 0, 0, 0)

BTW, I am using gdb 6.3.50.20050621-cvs - it's the only one I've found
that doesn't bonk rolling over the end of a thread stack on Solaris.

Thanks
Lou
-- 
Louis LeBlanc                                     dev@keyslapper.net
Fully Funded Hobbyist,                   KeySlapper Extrordinaire :þ
http://www.keyslapper.net                                       Ô¿Ô¬
Key fingerprint = C5E7 4762 F071 CE3B ED51  4FB8 AF85 A2FE 80C8 D9A2

Neutrinos are into physicists.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aborted thread backtrace stops at sighandler call
  2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
@ 2005-06-24 18:29 ` Daniel Jacobowitz
  2005-06-24 18:52 ` Jason Molenda
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Daniel Jacobowitz @ 2005-06-24 18:29 UTC (permalink / raw)
  To: gdb

On Fri, Jun 24, 2005 at 11:11:29AM -0400, Louis LeBlanc wrote:
> Well, often these were bus errors and gdb just couldn't nail things
> down for me.  Finally, I decided to catch these signals (SIGBUS,
> SIGSEGV) and collect what info I could before calling abort() - which
> preserves the stack pretty well according to pstack.  When a problem
> arises, I can look at the pstack output and see quite clearly that the
> problem is this screwy network glitch I've never been able to track
> down.  Problem is when it's something else, gdb doesn't seem to be
> able to see the preserved stack.
> 
> Anyone have any idea how to get this?
> 
> This is the backtrace for the aborted thread:
> (gdb) bt
> #0  0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
> #1  0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
> #2  0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
> #3  0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
> #4  0xff365b14 in ?? ()
> #5  0xff365b18 in ?? ()
> Previous frame identical to this frame (corrupt stack?)

What this amounts to is a bug in GDB's signal frame unwinder for
Solaris; I'm afraid I can't offer you any more help than that, since I
don't generally develop or test on Solaris.


-- 
Daniel Jacobowitz
CodeSourcery, LLC


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aborted thread backtrace stops at sighandler call
  2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
  2005-06-24 18:29 ` Daniel Jacobowitz
@ 2005-06-24 18:52 ` Jason Molenda
  2005-06-24 21:53 ` Mark Kettenis
  2005-06-25 15:28 ` Mark Kettenis
  3 siblings, 0 replies; 6+ messages in thread
From: Jason Molenda @ 2005-06-24 18:52 UTC (permalink / raw)
  To: GDB question

On Jun 24, 2005, at 8:11 AM, Louis LeBlanc wrote:

> Anyone have any idea how to get this?
>
> This is the backtrace for the aborted thread:
> (gdb) bt
> #0  0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
> #1  0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
> #2  0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
> #3  0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0,  
> Context=0xfe776818) at xpcsig.c:347
> #4  0xff365b14 in ?? ()
> #5  0xff365b18 in ?? ()
> Previous frame identical to this frame (corrupt stack?)
>
> But pstack shows this:
> -----------------  lwp# 3 / thread# 3  --------------------
>  fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
>  fe9b6cd8 abort    (df708, 0, 0, 0, 0, 0) + 100
>  0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
>  ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
>  ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
>  ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100,  
> 0) + 64
>  --- called from signal handler with signal 11 (SIGSEGV) ---


I don't have anything better than Daniel to suggest, but the problem  
here is specifically that backtracing through a signal handler is a  
special case -- it's the most fragile part of any backtracer in gdb  
-- and that's where your back trace is failing.

J


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aborted thread backtrace stops at sighandler call
  2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
  2005-06-24 18:29 ` Daniel Jacobowitz
  2005-06-24 18:52 ` Jason Molenda
@ 2005-06-24 21:53 ` Mark Kettenis
  2005-06-24 22:27   ` Louis LeBlanc
  2005-06-25 15:28 ` Mark Kettenis
  3 siblings, 1 reply; 6+ messages in thread
From: Mark Kettenis @ 2005-06-24 21:53 UTC (permalink / raw)
  To: gdb; +Cc: gdb

   Date: Fri, 24 Jun 2005 11:11:29 -0400
   From: Louis LeBlanc <dev@keyslapper.net>

   Hey everyone.

   I've got an app that seems ok under some pretty heavy load, but once
   in a great while, it blows up during some network related operation,
   particularly host name lookups.  I'm having similar problems with
   other apps (even perl scripts) on the same OS (Solaris 8 & 9).

On sparc or i386?

   Well, often these were bus errors and gdb just couldn't nail things
   down for me.  Finally, I decided to catch these signals (SIGBUS,
   SIGSEGV) and collect what info I could before calling abort() - which
   preserves the stack pretty well according to pstack.  When a problem
   arises, I can look at the pstack output and see quite clearly that the
   problem is this screwy network glitch I've never been able to track
   down.  Problem is when it's something else, gdb doesn't seem to be
   able to see the preserved stack.

   Anyone have any idea how to get this?

Sorry, but I don't understand your problem.  Is it the fact that gdb's
backtrace is different from the backtrace shown by pstack?

   This is the backtrace for the aborted thread:
   (gdb) bt
   #0  0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
   #1  0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
   #2  0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
   #3  0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
   #4  0xff365b14 in ?? ()
   #5  0xff365b18 in ?? ()
   Previous frame identical to this frame (corrupt stack?)

   But pstack shows this:
   -----------------  lwp# 3 / thread# 3  --------------------
    fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
    fe9b6cd8 abort    (df708, 0, 0, 0, 0, 0) + 100
    0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
    ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
    ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
    ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100, 0) + 64
    --- called from signal handler with signal 11 (SIGSEGV) ---
    fe9b44e4 strlen   (fe7778c0, 0, fe777850, 0, 0, 0) + 80
    fea08c98 vsnprintf (fe7784c0, c00, fe7778c0, fe779118, 7300, fe7778c0) + 5c
    000d3390 ERROR    (dffa8, 0, 7530, 1, 81010100, 3d740) + 48
    0003d590 make_ssl_connection (fe779368, a060164, 0, fd043b8, 7530, fe77bed0) + 57c
    000300c0 handle_check (10e800, dc290, ffffd438, 1, 0, 5b7550) + 1c08
    000d7bbc spawn    (fcaa2b0, 0, 0, 0, 0, 0) + 20
    ff3657b4 _lwp_start (0, 0, 0, 0, 0, 0)

   BTW, I am using gdb 6.3.50.20050621-cvs - it's the only one I've found
   that doesn't bonk rolling over the end of a thread stack on Solaris.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aborted thread backtrace stops at sighandler call
  2005-06-24 21:53 ` Mark Kettenis
@ 2005-06-24 22:27   ` Louis LeBlanc
  0 siblings, 0 replies; 6+ messages in thread
From: Louis LeBlanc @ 2005-06-24 22:27 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: gdb

On 06/24/05 11:53 PM, Mark Kettenis sat at the `puter and typed:
>    Date: Fri, 24 Jun 2005 11:11:29 -0400
>    From: Louis LeBlanc <dev@keyslapper.net>
> 
>    Hey everyone.
> 
>    I've got an app that seems ok under some pretty heavy load, but once
>    in a great while, it blows up during some network related operation,
>    particularly host name lookups.  I'm having similar problems with
>    other apps (even perl scripts) on the same OS (Solaris 8 & 9).
> 
> On sparc or i386?

sparc

>    Well, often these were bus errors and gdb just couldn't nail things
>    down for me.  Finally, I decided to catch these signals (SIGBUS,
>    SIGSEGV) and collect what info I could before calling abort() - which
>    preserves the stack pretty well according to pstack.  When a problem
>    arises, I can look at the pstack output and see quite clearly that the
>    problem is this screwy network glitch I've never been able to track
>    down.  Problem is when it's something else, gdb doesn't seem to be
>    able to see the preserved stack.
> 
>    Anyone have any idea how to get this?
> 
> Sorry, but I don't understand your problem.  Is it the fact that gdb's
> backtrace is different from the backtrace shown by pstack?

Kinda.  I can't get past the sighandler stack to the calling thread.
AFAICT, pstack is giving the same stack from the handler call on, just
a little more detail.  Gdb doesn't seem to be able to find the stack
of the thread that called the handler.

>    This is the backtrace for the aborted thread:
>    (gdb) bt
>    #0  0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
>    #1  0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
>    #2  0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
>    #3  0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
>    #4  0xff365b14 in ?? ()
>    #5  0xff365b18 in ?? ()
>    Previous frame identical to this frame (corrupt stack?)
> 
>    But pstack shows this:
>    -----------------  lwp# 3 / thread# 3  --------------------
>     fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
>     fe9b6cd8 abort    (df708, 0, 0, 0, 0, 0) + 100
>     0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
>     ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
>     ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
>     ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100, 0) + 64
>     --- called from signal handler with signal 11 (SIGSEGV) ---
>     fe9b44e4 strlen   (fe7778c0, 0, fe777850, 0, 0, 0) + 80
>     fea08c98 vsnprintf (fe7784c0, c00, fe7778c0, fe779118, 7300, fe7778c0) + 5c
>     000d3390 ERROR    (dffa8, 0, 7530, 1, 81010100, 3d740) + 48
>     0003d590 make_ssl_connection (fe779368, a060164, 0, fd043b8, 7530, fe77bed0) + 57c
>     000300c0 handle_check (10e800, dc290, ffffd438, 1, 0, 5b7550) + 1c08
>     000d7bbc spawn    (fcaa2b0, 0, 0, 0, 0, 0) + 20
>     ff3657b4 _lwp_start (0, 0, 0, 0, 0, 0)
> 
> 
>    BTW, I am using gdb 6.3.50.20050621-cvs - it's the only one I've found
>    that doesn't bonk rolling over the end of a thread stack on Solaris.
> 

-- 
Louis LeBlanc                                     dev@keyslapper.net
Fully Funded Hobbyist,                   KeySlapper Extrordinaire :þ
http://www.keyslapper.net                                       Ô¿Ô¬
Key fingerprint = C5E7 4762 F071 CE3B ED51  4FB8 AF85 A2FE 80C8 D9A2

QOTD:
  The only easy way to tell a hamster from a gerbil is that the
  gerbil has more dark meat.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aborted thread backtrace stops at sighandler call
  2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
                   ` (2 preceding siblings ...)
  2005-06-24 21:53 ` Mark Kettenis
@ 2005-06-25 15:28 ` Mark Kettenis
  3 siblings, 0 replies; 6+ messages in thread
From: Mark Kettenis @ 2005-06-25 15:28 UTC (permalink / raw)
  To: gdb; +Cc: gdb

   Date: Fri, 24 Jun 2005 11:11:29 -0400
   From: Louis LeBlanc <dev@keyslapper.net>

Looked a bit closer into this.

   This is the backtrace for the aborted thread:
   (gdb) bt
   #0  0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
   #1  0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
   #2  0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
   #3  0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
   #4  0xff365b14 in ?? ()
   #5  0xff365b18 in ?? ()
   Previous frame identical to this frame (corrupt stack?)

This backtrace is weird.  __tbl_2_huge_digits isn't a function but
some sort of data structure.  No surprise that gdb's unwinder gets
confused!  For some reason gdb's symbol reader made a mistake when
reading in the symbols for /usr/lib/libc.so.1.

Mark


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-06-25 15:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
2005-06-24 18:29 ` Daniel Jacobowitz
2005-06-24 18:52 ` Jason Molenda
2005-06-24 21:53 ` Mark Kettenis
2005-06-24 22:27   ` Louis LeBlanc
2005-06-25 15:28 ` Mark Kettenis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox