* aborted thread backtrace stops at sighandler call
@ 2005-06-24 15:11 Louis LeBlanc
2005-06-24 18:29 ` Daniel Jacobowitz
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Louis LeBlanc @ 2005-06-24 15:11 UTC (permalink / raw)
To: gdb
Hey everyone.
I've got an app that seems ok under some pretty heavy load, but once
in a great while, it blows up during some network related operation,
particularly host name lookups. I'm having similar problems with
other apps (even perl scripts) on the same OS (Solaris 8 & 9).
Well, often these were bus errors and gdb just couldn't nail things
down for me. Finally, I decided to catch these signals (SIGBUS,
SIGSEGV) and collect what info I could before calling abort() - which
preserves the stack pretty well according to pstack. When a problem
arises, I can look at the pstack output and see quite clearly that the
problem is this screwy network glitch I've never been able to track
down. Problem is when it's something else, gdb doesn't seem to be
able to see the preserved stack.
Anyone have any idea how to get this?
This is the backtrace for the aborted thread:
(gdb) bt
#0 0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
#1 0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
#2 0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
#3 0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
#4 0xff365b14 in ?? ()
#5 0xff365b18 in ?? ()
Previous frame identical to this frame (corrupt stack?)
But pstack shows this:
----------------- lwp# 3 / thread# 3 --------------------
fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
fe9b6cd8 abort (df708, 0, 0, 0, 0, 0) + 100
0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100, 0) + 64
--- called from signal handler with signal 11 (SIGSEGV) ---
fe9b44e4 strlen (fe7778c0, 0, fe777850, 0, 0, 0) + 80
fea08c98 vsnprintf (fe7784c0, c00, fe7778c0, fe779118, 7300, fe7778c0) + 5c
000d3390 ERROR (dffa8, 0, 7530, 1, 81010100, 3d740) + 48
0003d590 make_ssl_connection (fe779368, a060164, 0, fd043b8, 7530, fe77bed0) + 57c
000300c0 handle_check (10e800, dc290, ffffd438, 1, 0, 5b7550) + 1c08
000d7bbc spawn (fcaa2b0, 0, 0, 0, 0, 0) + 20
ff3657b4 _lwp_start (0, 0, 0, 0, 0, 0)
BTW, I am using gdb 6.3.50.20050621-cvs - it's the only one I've found
that doesn't bonk rolling over the end of a thread stack on Solaris.
Thanks
Lou
--
Louis LeBlanc dev@keyslapper.net
Fully Funded Hobbyist, KeySlapper Extrordinaire :þ
http://www.keyslapper.net Ô¿Ô¬
Key fingerprint = C5E7 4762 F071 CE3B ED51 4FB8 AF85 A2FE 80C8 D9A2
Neutrinos are into physicists.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aborted thread backtrace stops at sighandler call
2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
@ 2005-06-24 18:29 ` Daniel Jacobowitz
2005-06-24 18:52 ` Jason Molenda
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Daniel Jacobowitz @ 2005-06-24 18:29 UTC (permalink / raw)
To: gdb
On Fri, Jun 24, 2005 at 11:11:29AM -0400, Louis LeBlanc wrote:
> Well, often these were bus errors and gdb just couldn't nail things
> down for me. Finally, I decided to catch these signals (SIGBUS,
> SIGSEGV) and collect what info I could before calling abort() - which
> preserves the stack pretty well according to pstack. When a problem
> arises, I can look at the pstack output and see quite clearly that the
> problem is this screwy network glitch I've never been able to track
> down. Problem is when it's something else, gdb doesn't seem to be
> able to see the preserved stack.
>
> Anyone have any idea how to get this?
>
> This is the backtrace for the aborted thread:
> (gdb) bt
> #0 0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
> #1 0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
> #2 0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
> #3 0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
> #4 0xff365b14 in ?? ()
> #5 0xff365b18 in ?? ()
> Previous frame identical to this frame (corrupt stack?)
What this amounts to is a bug in GDB's signal frame unwinder for
Solaris; I'm afraid I can't offer you any more help than that, since I
don't generally develop or test on Solaris.
--
Daniel Jacobowitz
CodeSourcery, LLC
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aborted thread backtrace stops at sighandler call
2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
2005-06-24 18:29 ` Daniel Jacobowitz
@ 2005-06-24 18:52 ` Jason Molenda
2005-06-24 21:53 ` Mark Kettenis
2005-06-25 15:28 ` Mark Kettenis
3 siblings, 0 replies; 6+ messages in thread
From: Jason Molenda @ 2005-06-24 18:52 UTC (permalink / raw)
To: GDB question
On Jun 24, 2005, at 8:11 AM, Louis LeBlanc wrote:
> Anyone have any idea how to get this?
>
> This is the backtrace for the aborted thread:
> (gdb) bt
> #0 0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
> #1 0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
> #2 0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
> #3 0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0,
> Context=0xfe776818) at xpcsig.c:347
> #4 0xff365b14 in ?? ()
> #5 0xff365b18 in ?? ()
> Previous frame identical to this frame (corrupt stack?)
>
> But pstack shows this:
> ----------------- lwp# 3 / thread# 3 --------------------
> fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
> fe9b6cd8 abort (df708, 0, 0, 0, 0, 0) + 100
> 0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
> ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
> ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
> ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100,
> 0) + 64
> --- called from signal handler with signal 11 (SIGSEGV) ---
I don't have anything better than Daniel to suggest, but the problem
here is specifically that backtracing through a signal handler is a
special case -- it's the most fragile part of any backtracer in gdb
-- and that's where your back trace is failing.
J
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aborted thread backtrace stops at sighandler call
2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
2005-06-24 18:29 ` Daniel Jacobowitz
2005-06-24 18:52 ` Jason Molenda
@ 2005-06-24 21:53 ` Mark Kettenis
2005-06-24 22:27 ` Louis LeBlanc
2005-06-25 15:28 ` Mark Kettenis
3 siblings, 1 reply; 6+ messages in thread
From: Mark Kettenis @ 2005-06-24 21:53 UTC (permalink / raw)
To: gdb; +Cc: gdb
Date: Fri, 24 Jun 2005 11:11:29 -0400
From: Louis LeBlanc <dev@keyslapper.net>
Hey everyone.
I've got an app that seems ok under some pretty heavy load, but once
in a great while, it blows up during some network related operation,
particularly host name lookups. I'm having similar problems with
other apps (even perl scripts) on the same OS (Solaris 8 & 9).
On sparc or i386?
Well, often these were bus errors and gdb just couldn't nail things
down for me. Finally, I decided to catch these signals (SIGBUS,
SIGSEGV) and collect what info I could before calling abort() - which
preserves the stack pretty well according to pstack. When a problem
arises, I can look at the pstack output and see quite clearly that the
problem is this screwy network glitch I've never been able to track
down. Problem is when it's something else, gdb doesn't seem to be
able to see the preserved stack.
Anyone have any idea how to get this?
Sorry, but I don't understand your problem. Is it the fact that gdb's
backtrace is different from the backtrace shown by pstack?
This is the backtrace for the aborted thread:
(gdb) bt
#0 0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
#1 0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
#2 0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
#3 0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
#4 0xff365b14 in ?? ()
#5 0xff365b18 in ?? ()
Previous frame identical to this frame (corrupt stack?)
But pstack shows this:
----------------- lwp# 3 / thread# 3 --------------------
fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
fe9b6cd8 abort (df708, 0, 0, 0, 0, 0) + 100
0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100, 0) + 64
--- called from signal handler with signal 11 (SIGSEGV) ---
fe9b44e4 strlen (fe7778c0, 0, fe777850, 0, 0, 0) + 80
fea08c98 vsnprintf (fe7784c0, c00, fe7778c0, fe779118, 7300, fe7778c0) + 5c
000d3390 ERROR (dffa8, 0, 7530, 1, 81010100, 3d740) + 48
0003d590 make_ssl_connection (fe779368, a060164, 0, fd043b8, 7530, fe77bed0) + 57c
000300c0 handle_check (10e800, dc290, ffffd438, 1, 0, 5b7550) + 1c08
000d7bbc spawn (fcaa2b0, 0, 0, 0, 0, 0) + 20
ff3657b4 _lwp_start (0, 0, 0, 0, 0, 0)
BTW, I am using gdb 6.3.50.20050621-cvs - it's the only one I've found
that doesn't bonk rolling over the end of a thread stack on Solaris.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aborted thread backtrace stops at sighandler call
2005-06-24 21:53 ` Mark Kettenis
@ 2005-06-24 22:27 ` Louis LeBlanc
0 siblings, 0 replies; 6+ messages in thread
From: Louis LeBlanc @ 2005-06-24 22:27 UTC (permalink / raw)
To: Mark Kettenis; +Cc: gdb
On 06/24/05 11:53 PM, Mark Kettenis sat at the `puter and typed:
> Date: Fri, 24 Jun 2005 11:11:29 -0400
> From: Louis LeBlanc <dev@keyslapper.net>
>
> Hey everyone.
>
> I've got an app that seems ok under some pretty heavy load, but once
> in a great while, it blows up during some network related operation,
> particularly host name lookups. I'm having similar problems with
> other apps (even perl scripts) on the same OS (Solaris 8 & 9).
>
> On sparc or i386?
sparc
> Well, often these were bus errors and gdb just couldn't nail things
> down for me. Finally, I decided to catch these signals (SIGBUS,
> SIGSEGV) and collect what info I could before calling abort() - which
> preserves the stack pretty well according to pstack. When a problem
> arises, I can look at the pstack output and see quite clearly that the
> problem is this screwy network glitch I've never been able to track
> down. Problem is when it's something else, gdb doesn't seem to be
> able to see the preserved stack.
>
> Anyone have any idea how to get this?
>
> Sorry, but I don't understand your problem. Is it the fact that gdb's
> backtrace is different from the backtrace shown by pstack?
Kinda. I can't get past the sighandler stack to the calling thread.
AFAICT, pstack is giving the same stack from the handler call on, just
a little more detail. Gdb doesn't seem to be able to find the stack
of the thread that called the handler.
> This is the backtrace for the aborted thread:
> (gdb) bt
> #0 0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
> #1 0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
> #2 0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
> #3 0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
> #4 0xff365b14 in ?? ()
> #5 0xff365b18 in ?? ()
> Previous frame identical to this frame (corrupt stack?)
>
> But pstack shows this:
> ----------------- lwp# 3 / thread# 3 --------------------
> fea1f82c _lwp_kill (6, 0, fe7765b8, fe776630, 0, 1) + 8
> fe9b6cd8 abort (df708, 0, 0, 0, 0, 0) + 100
> 0003c724 XPCSigCheck (b, fe776ad0, fe776818, 0, 0, 0) + 2c0
> ff365b0c __sighndlr (b, fe776ad0, fe776818, 3c464, 0, 0) + c
> ff35f804 call_user_handler (b, fe776ad0, fe776818, 0, 0, 0) + 234
> ff35f9b4 sigacthandler (b, fe776ad0, fe776818, 7efefeff, 81010100, 0) + 64
> --- called from signal handler with signal 11 (SIGSEGV) ---
> fe9b44e4 strlen (fe7778c0, 0, fe777850, 0, 0, 0) + 80
> fea08c98 vsnprintf (fe7784c0, c00, fe7778c0, fe779118, 7300, fe7778c0) + 5c
> 000d3390 ERROR (dffa8, 0, 7530, 1, 81010100, 3d740) + 48
> 0003d590 make_ssl_connection (fe779368, a060164, 0, fd043b8, 7530, fe77bed0) + 57c
> 000300c0 handle_check (10e800, dc290, ffffd438, 1, 0, 5b7550) + 1c08
> 000d7bbc spawn (fcaa2b0, 0, 0, 0, 0, 0) + 20
> ff3657b4 _lwp_start (0, 0, 0, 0, 0, 0)
>
>
> BTW, I am using gdb 6.3.50.20050621-cvs - it's the only one I've found
> that doesn't bonk rolling over the end of a thread stack on Solaris.
>
--
Louis LeBlanc dev@keyslapper.net
Fully Funded Hobbyist, KeySlapper Extrordinaire :þ
http://www.keyslapper.net Ô¿Ô¬
Key fingerprint = C5E7 4762 F071 CE3B ED51 4FB8 AF85 A2FE 80C8 D9A2
QOTD:
The only easy way to tell a hamster from a gerbil is that the
gerbil has more dark meat.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aborted thread backtrace stops at sighandler call
2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
` (2 preceding siblings ...)
2005-06-24 21:53 ` Mark Kettenis
@ 2005-06-25 15:28 ` Mark Kettenis
3 siblings, 0 replies; 6+ messages in thread
From: Mark Kettenis @ 2005-06-25 15:28 UTC (permalink / raw)
To: gdb; +Cc: gdb
Date: Fri, 24 Jun 2005 11:11:29 -0400
From: Louis LeBlanc <dev@keyslapper.net>
Looked a bit closer into this.
This is the backtrace for the aborted thread:
(gdb) bt
#0 0xfea1f82c in __tbl_2_huge_digits () from /usr/lib/libc.so.1
#1 0xfe9d0a24 in sysconf () from /usr/lib/libc.so.1
#2 0xfe9b6ce0 in ascftime () from /usr/lib/libc.so.1
#3 0x0003c72c in XPCSigCheck (Sig=11, Info=0xfe776ad0, Context=0xfe776818) at xpcsig.c:347
#4 0xff365b14 in ?? ()
#5 0xff365b18 in ?? ()
Previous frame identical to this frame (corrupt stack?)
This backtrace is weird. __tbl_2_huge_digits isn't a function but
some sort of data structure. No surprise that gdb's unwinder gets
confused! For some reason gdb's symbol reader made a mistake when
reading in the symbols for /usr/lib/libc.so.1.
Mark
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-06-25 15:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-24 15:11 aborted thread backtrace stops at sighandler call Louis LeBlanc
2005-06-24 18:29 ` Daniel Jacobowitz
2005-06-24 18:52 ` Jason Molenda
2005-06-24 21:53 ` Mark Kettenis
2005-06-24 22:27 ` Louis LeBlanc
2005-06-25 15:28 ` Mark Kettenis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox