problem unwinding past pthread_cond_wait() on x86 RedHat 9.0

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
@ 2003-10-14  5:42 Joel Brobecker
  2003-10-14 12:57 ` Daniel Jacobowitz
  2003-10-23  1:07 ` Joel Brobecker
  0 siblings, 2 replies; 16+ messages in thread
From: Joel Brobecker @ 2003-10-14  5:42 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 4788 bytes --]

Hello,

while trying to move from GDB 5.3 to 6.0, we noticed a small
"regression" in a backtrace after switching to a thread blocked
on a pthread_cond_wait() call. This occurs only on RH9 (we tried
on RH8 and RH7).

To reproduce the problem, compile the attached C program with
(I'll be more than happy to contribute that testcase if you are
interested. Coming from the Ada world where tasking is really easy,
I am not too familiar with pthreads, especially in terms of portability,
but I welcome critics :)

        % gcc -D_REENTRANT -g -o pt pt.c -lpthread

Then do the following:

        % gdb pt
        (gdb) break break_me
        (gdb) run
        (gdb) thread 2
        (gdb) bt

The thread that I created should be blocked on a pthread_cond_wait
waiting for a condition to be signaled. But we never signal this
condition, so it should wait there forever. So the backtrace I
expected should look like this:

        #0 in pthread_cond_wait ()
        #1 in cond_wait ()
        #2 in noreturn ()
        #3 in forever_thread ()
        (more frames follow)

Instead, here is all what we get:

        #0  0xffffe002 in ?? ()
        #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
           from /lib/tls/libpthread.so.0

With GDB 5.3, we used to get:

        #0  0xffffe002 in ?? ()
        #1  0x4002b2b6 in start_thread () from /lib/tls/libpthread.so.0

Which isn't any better, and that explains why I quoted "regression" in
the first paragraph. The change of behavior becomes much more negatively
obvious when we debug an Ada program, because instead of the not-so-correct
backtrace we used to get with 5.3 (missing a frame or two between #0 and
#1):

        #0  0xffffe002 in ?? ()
        #1  0x0804fb29 in system__tasking__rendezvous__accept_trivial ()
        #2  0x08049f48 in task_switch.callee (<_task>=0x806e708)
                          at task_switch.adb:29
        #3  0x08053394 in system__tasking__stages__task_wrapper ()
        #4  0x4002b2b6 in start_thread () from /lib/tls/libpthread.so.0

We now basically get almost nothing:

        #0  0xffffe002 in ?? ()
        #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
           from /lib/tls/libpthread.so.0

I think I found the source of the problem when looking at the assembly
code for pthread_cond_wait in libpthread.so. Here is what it looks like:

0x4002d2e0 <pthread_cond_wait+0>:          push   %edi
0x4002d2e1 <pthread_cond_wait+1>:          push   %esi
0x4002d2e2 <pthread_cond_wait+2>:          push   %ebx
[a bunch of instructions, including conditional jumps]
0x4002d2fa <pthread_cond_wait+26>:         pushl  0x14(%esp,1)
[...]
0x4002d324 <pthread_cond_wait+68>: sub    $0x20,%esp
[some other bunch of instructions, and then finally the code were we stopped:]
0x4002d372 <pthread_cond_wait+146>:        call   *%gs:0x10
0x4002d379 <pthread_cond_wait+153>:        sub    $0xc,%ebx

So we are at pthread_cond_wait+146, and the i386 frame code is trying to
unwind past this function. So it looks at the function prologue, finds
that it is frameless. So it uses the backup plan and is trying to find
the "frame" base using the SP instead of the base pointer. It then
analyzes the prologue and finds the 3 push instructions saving certain
registers, and therefore determines that the offset between the SP and
the BP must be these 12 bytes. Unfortunately, we missed the pushl
and the sub instructions that updated the SP by another 36 bytes!
So eventually the unwinder got the wrong frame base, and therefore
got the wrong address to fetch the saved EIP, which lead the unwinder
to stop because the EIP value fetch was NULL.

I tried an experiment of running the debugger under debugger
suppervision, and I assumed despite the numerous conditions jumps
everywhere that the "pushl" and the "sub" instructions were executed
exactly once. So I manually changed the offset to be 12 + 4 + 32 = 48
(so cache->offset was 44), and voila!

    #0  0xffffe002 in ?? ()
    #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/tls/libpthread.so.0
    #2  0x0804855e in cond_wait (cond=0x4083484c, mut=0x4083487c) at pt.c:9
    #3  0x080485a9 in noreturn () at pt.c:24
    #4  0x080485b9 in forever_pthread (unused=0x0) at pt.c:30
    #5  0x4002b2b6 in start_thread () from /lib/tls/libpthread.so.0
    #6  0x420de407 in clone () from /lib/tls/libc.so.6

The problem I am now trying to solve is the following: How can we fix
the i386 unwinder to be smart enough to handle this wicked function?
Is this even possible? The only possibility I see right now is with
dwarf2 CFI, but then the problem I foresee is that we can not help
the people using the stock RH9. If the only hope is with CFI, then
they will have to update their pthread library...

What do you guys think?

-- 
Joel

[-- Attachment #2: pt.c --]
[-- Type: text/plain, Size: 872 bytes --]

#include <pthread.h>
#include <stdio.h>
#include <time.h>

void
cond_wait (pthread_cond_t *cond, pthread_mutex_t *mut)
{
  pthread_mutex_lock(mut);
  pthread_cond_wait (cond, mut);
  pthread_mutex_unlock (mut);
}

void
noreturn (void)
{
  pthread_mutex_t mut;
  pthread_cond_t cond;

  pthread_mutex_init (&mut, NULL);
  pthread_cond_init (&cond, NULL);

  /* Wait for a condition that will never be signaled, so we effectively
     block the thread here.  */
  cond_wait (&cond, &mut);
}

void *
forever_pthread (void *unused)
{
  noreturn ();
}

void
break_me (void)
{
  /* Just an anchor to help putting a breakpoint.  */
}

int
main (void)
{
  pthread_t forever;
  const struct timespec ts = { 0, 10000000 }; /* 0.01 sec */

  pthread_create (&forever, NULL, forever_pthread, NULL);
  for (;;)
    {
      nanosleep (&ts, NULL);
      break_me();
    }

  return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14  5:42 problem unwinding past pthread_cond_wait() on x86 RedHat 9.0 Joel Brobecker
@ 2003-10-14 12:57 ` Daniel Jacobowitz
  2003-10-14 15:24   ` Andrew Cagney
  2003-10-14 15:58   ` Joel Brobecker
  2003-10-23  1:07 ` Joel Brobecker
  1 sibling, 2 replies; 16+ messages in thread
From: Daniel Jacobowitz @ 2003-10-14 12:57 UTC (permalink / raw)
  To: gdb-patches

On Mon, Oct 13, 2003 at 10:42:25PM -0700, Joel Brobecker wrote:
> We now basically get almost nothing:
> 
>         #0  0xffffe002 in ?? ()
>         #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
>            from /lib/tls/libpthread.so.0

That's NPTL.  Are you sure you understand the problem right - I don't
have RH9's glibc here, only Rawhide's, but there's CFI for
pthread_cond_wait in Rawhide.

So anyway this _will_ go away someday.

> The problem I am now trying to solve is the following: How can we fix
> the i386 unwinder to be smart enough to handle this wicked function?
> Is this even possible? The only possibility I see right now is with
> dwarf2 CFI, but then the problem I foresee is that we can not help
> the people using the stock RH9. If the only hope is with CFI, then
> they will have to update their pthread library...

You really can't unwind past this sort of thing without either debug
info or frame pointers.  How did it work in 5.3?  I'm assuming dumb
luck, we unwound 0xfffffe02 wrong.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 12:57 ` Daniel Jacobowitz
@ 2003-10-14 15:24   ` Andrew Cagney
  2003-10-14 15:46     ` Joel Brobecker
  2003-10-14 15:58   ` Joel Brobecker
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Cagney @ 2003-10-14 15:24 UTC (permalink / raw)
  To: Daniel Jacobowitz, Joel Brobecker; +Cc: gdb-patches

> On Mon, Oct 13, 2003 at 10:42:25PM -0700, Joel Brobecker wrote:
> 
>> We now basically get almost nothing:
>> 
>>         #0  0xffffe002 in ?? ()
>>         #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
>>            from /lib/tls/libpthread.so.0

Joel, what happens if you type:

(gdb) x/i 0xffffe002

Andrew

> That's NPTL.  Are you sure you understand the problem right - I don't
> have RH9's glibc here, only Rawhide's, but there's CFI for
> pthread_cond_wait in Rawhide.
> 
> So anyway this _will_ go away someday.
> 
> 
>> The problem I am now trying to solve is the following: How can we fix
>> the i386 unwinder to be smart enough to handle this wicked function?
>> Is this even possible? The only possibility I see right now is with
>> dwarf2 CFI, but then the problem I foresee is that we can not help
>> the people using the stock RH9. If the only hope is with CFI, then
>> they will have to update their pthread library...
> 
> 
> You really can't unwind past this sort of thing without either debug
> info or frame pointers.  How did it work in 5.3?  I'm assuming dumb
> luck, we unwound 0xfffffe02 wrong.
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 15:24   ` Andrew Cagney
@ 2003-10-14 15:46     ` Joel Brobecker
  2003-10-14 15:52       ` Daniel Jacobowitz
  2003-10-14 15:53       ` Elena Zannoni
  0 siblings, 2 replies; 16+ messages in thread
From: Joel Brobecker @ 2003-10-14 15:46 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Daniel Jacobowitz, gdb-patches

> Joel, what happens if you type:
> 
> (gdb) x/i 0xffffe002

Something like "Cannot read memory at 0xffffe002" (already tried it :-).
What is this address, BTW. I always wondered... Kernel code? Special
address?

-- 
Joel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 15:46     ` Joel Brobecker
@ 2003-10-14 15:52       ` Daniel Jacobowitz
  2003-10-14 16:15         ` Andrew Cagney
  2003-10-14 15:53       ` Elena Zannoni
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Jacobowitz @ 2003-10-14 15:52 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: Andrew Cagney, gdb-patches

On Tue, Oct 14, 2003 at 08:46:49AM -0700, Joel Brobecker wrote:
> > Joel, what happens if you type:
> > 
> > (gdb) x/i 0xffffe002
> 
> Something like "Cannot read memory at 0xffffe002" (already tried it :-).
> What is this address, BTW. I always wondered... Kernel code? Special
> address?

Search for "vsyscall DSO" in the gdb@ archives to learn more than you
ever wanted to know :)  Patches to fully support it are still pending.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 15:46     ` Joel Brobecker
  2003-10-14 15:52       ` Daniel Jacobowitz
@ 2003-10-14 15:53       ` Elena Zannoni
  1 sibling, 0 replies; 16+ messages in thread
From: Elena Zannoni @ 2003-10-14 15:53 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: Andrew Cagney, Daniel Jacobowitz, gdb-patches

Joel Brobecker writes:
 > > Joel, what happens if you type:
 > > 
 > > (gdb) x/i 0xffffe002
 > 
 > Something like "Cannot read memory at 0xffffe002" (already tried it :-).
 > What is this address, BTW. I always wondered... Kernel code? Special
 > address?
 > 

Yeah, that's the vsyscall stuff. GDB cannot access it. It will be
really solved only with the 6.? kernel.
(See the thread that talks about the vsyscall DSO)

elena

 > -- 
 > Joel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 12:57 ` Daniel Jacobowitz
  2003-10-14 15:24   ` Andrew Cagney
@ 2003-10-14 15:58   ` Joel Brobecker
  2003-10-14 16:02     ` Daniel Jacobowitz
  1 sibling, 1 reply; 16+ messages in thread
From: Joel Brobecker @ 2003-10-14 15:58 UTC (permalink / raw)
  To: gdb-patches

> That's NPTL.  Are you sure you understand the problem right - I don't
> have RH9's glibc here, only Rawhide's, but there's CFI for
> pthread_cond_wait in Rawhide.

I can't say I'm 100% sure, but last time I checked, I couldn't find
any CFI:

   % objdump --headers /lib/tls/libpthread.so.0 | grep frame
    15 .eh_frame_hdr 0000002c  00009dc8  00009dc8  00009dc8  2**2
    16 .eh_frame     0000010c  00009df4  00009df4  00009df4  2**2

No .debug_frame section (not a single dwarf2-related section for
that matter).

> So anyway this _will_ go away someday.

Yes, fortunately. I just suppose that the RH9.0 users will probably have
to update their NPTL library if they want this to work...

> You really can't unwind past this sort of thing without either debug
> info or frame pointers.

That was my feeling too. But having only a little experience in the
area, I was wondering if there was any technique that I didn't know
about.

> How did it work in 5.3?  I'm assuming dumb luck, we unwound 0xfffffe02
> wrong.

With 5.3, it was "luck", if we can call it that way (the old backtrace
is incomplete too, and probably the value of some registers is not
unwound properly in some of the frames). I didn't look too closely, but
I think GDB 5.3 didn't handle 0xfffffe02 as a frameless function, and
therefore used %ebp to fetch the return address. The problem is that
this %ebp was the frame pointer from a caller two or three frames up...
So we ended up skipping these two or three frames.  And then after that,
it was business as usual...

-- 
Joel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 15:58   ` Joel Brobecker
@ 2003-10-14 16:02     ` Daniel Jacobowitz
  2003-10-14 16:21       ` Joel Brobecker
  2003-10-15 19:34       ` Mark Kettenis
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Jacobowitz @ 2003-10-14 16:02 UTC (permalink / raw)
  To: gdb-patches; +Cc: kettenis

On Tue, Oct 14, 2003 at 08:58:10AM -0700, Joel Brobecker wrote:
> > That's NPTL.  Are you sure you understand the problem right - I don't
> > have RH9's glibc here, only Rawhide's, but there's CFI for
> > pthread_cond_wait in Rawhide.
> 
> I can't say I'm 100% sure, but last time I checked, I couldn't find
> any CFI:
> 
>    % objdump --headers /lib/tls/libpthread.so.0 | grep frame
>     15 .eh_frame_hdr 0000002c  00009dc8  00009dc8  00009dc8  2**2
>     16 .eh_frame     0000010c  00009df4  00009df4  00009df4  2**2
> 
> No .debug_frame section (not a single dwarf2-related section for
> that matter).

That is CFI.  The .eh_frame section is actually just about the same as
the .debug_frame section, but encoded a little differently and loaded
into memory instead of marked as a debugging section.

However, 0x10c bytes is not encouraging for having unwind info for the
function in question.  In rawhide:
 15 .eh_frame_hdr 000001d4  0000bf10  0000bf10  0000bf10  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .eh_frame     00000944  0000c0e4  0000c0e4  0000c0e4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

> > So anyway this _will_ go away someday.
> 
> Yes, fortunately. I just suppose that the RH9.0 users will probably have
> to update their NPTL library if they want this to work...
> 
> > You really can't unwind past this sort of thing without either debug
> > info or frame pointers.
> 
> That was my feeling too. But having only a little experience in the
> area, I was wondering if there was any technique that I didn't know
> about.
> 
> > How did it work in 5.3?  I'm assuming dumb luck, we unwound 0xfffffe02
> > wrong.
> 
> With 5.3, it was "luck", if we can call it that way (the old backtrace
> is incomplete too, and probably the value of some registers is not
> unwound properly in some of the frames). I didn't look too closely, but
> I think GDB 5.3 didn't handle 0xfffffe02 as a frameless function, and
> therefore used %ebp to fetch the return address. The problem is that
> this %ebp was the frame pointer from a caller two or three frames up...
> So we ended up skipping these two or three frames.  And then after that,
> it was business as usual...

Ah, and pthread_cond_wait is frameless so that worked.  Hmmmmm.  If we
get confused, falling back to trying %ebp wouldn't be an entirely bad
idea.  Mark, does that seem plausible or is it just asking for
problems?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 15:52       ` Daniel Jacobowitz
@ 2003-10-14 16:15         ` Andrew Cagney
  2003-10-14 16:18           ` Daniel Jacobowitz
  2003-10-14 16:19           ` Joel Brobecker
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Cagney @ 2003-10-14 16:15 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Joel Brobecker, Andrew Cagney, gdb-patches

> On Tue, Oct 14, 2003 at 08:46:49AM -0700, Joel Brobecker wrote:
> 
>> > Joel, what happens if you type:
>> > 
>> > (gdb) x/i 0xffffe002
> 
>> 
>> Something like "Cannot read memory at 0xffffe002" (already tried it :-).
>> What is this address, BTW. I always wondered... Kernel code? Special
>> address?
> 
> 
> Search for "vsyscall DSO" in the gdb@ archives to learn more than you
> ever wanted to know :)  Patches to fully support it are still pending.

No.

No matter how much you patch GDB, GDB isn't going to dig itself out of 
thishole.  It's not able to access the code/data/whatever at that 
address so it's never going to correctly unwind from it.

Kernel bug - incomplete functionality.

Andrew



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 16:15         ` Andrew Cagney
@ 2003-10-14 16:18           ` Daniel Jacobowitz
  2003-10-14 16:19           ` Joel Brobecker
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel Jacobowitz @ 2003-10-14 16:18 UTC (permalink / raw)
  To: gdb-patches

On Tue, Oct 14, 2003 at 12:15:53PM -0400, Andrew Cagney wrote:
> >On Tue, Oct 14, 2003 at 08:46:49AM -0700, Joel Brobecker wrote:
> >
> >>> Joel, what happens if you type:
> >>> 
> >>> (gdb) x/i 0xffffe002
> >
> >>
> >>Something like "Cannot read memory at 0xffffe002" (already tried it :-).
> >>What is this address, BTW. I always wondered... Kernel code? Special
> >>address?
> >
> >
> >Search for "vsyscall DSO" in the gdb@ archives to learn more than you
> >ever wanted to know :)  Patches to fully support it are still pending.
> 
> No.
> 
> No matter how much you patch GDB, GDB isn't going to dig itself out of 
> thishole.  It's not able to access the code/data/whatever at that 
> address so it's never going to correctly unwind from it.
> 
> Kernel bug - incomplete functionality.

We're both right.  Patches for this are still pending.  They'll fix the
unwinding when using a 2.6 kernel which _does_ let the page be
accessed.  At that point I wouldn't be surprised if you folks at RH
published an errata kernel which let this work in RH9, or RH10 or
whatever it is now.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 16:15         ` Andrew Cagney
  2003-10-14 16:18           ` Daniel Jacobowitz
@ 2003-10-14 16:19           ` Joel Brobecker
  1 sibling, 0 replies; 16+ messages in thread
From: Joel Brobecker @ 2003-10-14 16:19 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Daniel Jacobowitz, gdb-patches

> No matter how much you patch GDB, GDB isn't going to dig itself out of 
> thishole.  It's not able to access the code/data/whatever at that 
> address so it's never going to correctly unwind from it.

It's actually able to do a decent job. The problem at hand is not
due to the "kernel bug", but rather the lack of unwind information
for the pthread_cond_wait() function.

And I am afraid that pthread_cond_wait() is just one example. With
GCC getting better and better at optimizing, we will likely see
this more and more often...

-- 
Joel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 16:02     ` Daniel Jacobowitz
@ 2003-10-14 16:21       ` Joel Brobecker
  2003-10-16 22:13         ` Richard Henderson
  2003-10-15 19:34       ` Mark Kettenis
  1 sibling, 1 reply; 16+ messages in thread
From: Joel Brobecker @ 2003-10-14 16:21 UTC (permalink / raw)
  To: gdb-patches, kettenis

> >    % objdump --headers /lib/tls/libpthread.so.0 | grep frame
> >     15 .eh_frame_hdr 0000002c  00009dc8  00009dc8  00009dc8  2**2
> >     16 .eh_frame     0000010c  00009df4  00009df4  00009df4  2**2
> > 
> > No .debug_frame section (not a single dwarf2-related section for
> > that matter).
> 
> That is CFI.  The .eh_frame section is actually just about the same as
> the .debug_frame section, but encoded a little differently and loaded
> into memory instead of marked as a debugging section.

Ah, OK, thanks. Given the name of the section, I thought it would
contain exception handling _regions_, without necessarily providing
frame information for each and every function. I need to learn a bit
more about them...

-- 
Joel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 16:02     ` Daniel Jacobowitz
  2003-10-14 16:21       ` Joel Brobecker
@ 2003-10-15 19:34       ` Mark Kettenis
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Kettenis @ 2003-10-15 19:34 UTC (permalink / raw)
  To: drow; +Cc: gdb-patches

   Date: Tue, 14 Oct 2003 12:02:20 -0400
   From: Daniel Jacobowitz <drow@mvista.com>

   > > How did it work in 5.3?  I'm assuming dumb luck, we unwound 0xfffffe02
   > > wrong.
   > 
   > With 5.3, it was "luck", if we can call it that way (the old backtrace
   > is incomplete too, and probably the value of some registers is not
   > unwound properly in some of the frames). I didn't look too closely, but
   > I think GDB 5.3 didn't handle 0xfffffe02 as a frameless function, and
   > therefore used %ebp to fetch the return address. The problem is that
   > this %ebp was the frame pointer from a caller two or three frames up...
   > So we ended up skipping these two or three frames.  And then after that,
   > it was business as usual...

   Ah, and pthread_cond_wait is frameless so that worked.  Hmmmmm.  If we
   get confused, falling back to trying %ebp wouldn't be an entirely bad
   idea.  Mark, does that seem plausible or is it just asking for
   problems?

It's tricky.  The point is that the unwinder tries very hard not to
get confused; only if it's certain that it has found code that sets up
a frame it uses %ebp.  Otherwise it assumes the function is frameless.
If we don't do it like this, we'll certainly miss some frames in some
fairly common cases, for example in many of the syscall stubs in
glibc.

Also note that for truly frameless code, %ebp can be used as a scratch
register, and therefore can't be trusted to contain a valid frame
pointer at all.

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14 16:21       ` Joel Brobecker
@ 2003-10-16 22:13         ` Richard Henderson
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2003-10-16 22:13 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches, kettenis

On Tue, Oct 14, 2003 at 09:21:55AM -0700, Joel Brobecker wrote:
> Ah, OK, thanks. Given the name of the section, I thought it would
> contain exception handling _regions_, without necessarily providing
> frame information for each and every function. I need to learn a bit
> more about them...

It isn't necessarily each and every function.  However, when 
exceptions are in use, it turns out to be most of them.  And
when thread cancelation is involved, it's all of them that 
lead to system calls.


r~


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-14  5:42 problem unwinding past pthread_cond_wait() on x86 RedHat 9.0 Joel Brobecker
  2003-10-14 12:57 ` Daniel Jacobowitz
@ 2003-10-23  1:07 ` Joel Brobecker
  2003-10-23  2:41   ` Daniel Jacobowitz
  1 sibling, 1 reply; 16+ messages in thread
From: Joel Brobecker @ 2003-10-23  1:07 UTC (permalink / raw)
  To: gdb-patches

Hello,

the discussion regarding this problem showed that we can not basically
do much in that case...

Still, I was wondering if you were interested in a new gdb.threads test.
As far as I can see, none of our current thread test seems to be
catching the problem I reported.

Basically, the C code would create a new thread, that we would block
on a call to pthread_cond_wait(). After reaching a breakpoint in the
main procedure, we would swith to the blocked thread, and try to get
a backtrace.

I can also open a PR...

Let me know.
-- 
Joel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
  2003-10-23  1:07 ` Joel Brobecker
@ 2003-10-23  2:41   ` Daniel Jacobowitz
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Jacobowitz @ 2003-10-23  2:41 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

On Wed, Oct 22, 2003 at 06:07:54PM -0700, Joel Brobecker wrote:
> Hello,
> 
> the discussion regarding this problem showed that we can not basically
> do much in that case...
> 
> Still, I was wondering if you were interested in a new gdb.threads test.
> As far as I can see, none of our current thread test seems to be
> catching the problem I reported.
> 
> Basically, the C code would create a new thread, that we would block
> on a call to pthread_cond_wait(). After reaching a breakpoint in the
> main procedure, we would swith to the blocked thread, and try to get
> a backtrace.
> 
> I can also open a PR...
> 
> Let me know.

New testcases are always appreciated.  It may end up being XFAIL'd for
some cases, though.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-10-23  2:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-14  5:42 problem unwinding past pthread_cond_wait() on x86 RedHat 9.0 Joel Brobecker
2003-10-14 12:57 ` Daniel Jacobowitz
2003-10-14 15:24   ` Andrew Cagney
2003-10-14 15:46     ` Joel Brobecker
2003-10-14 15:52       ` Daniel Jacobowitz
2003-10-14 16:15         ` Andrew Cagney
2003-10-14 16:18           ` Daniel Jacobowitz
2003-10-14 16:19           ` Joel Brobecker
2003-10-14 15:53       ` Elena Zannoni
2003-10-14 15:58   ` Joel Brobecker
2003-10-14 16:02     ` Daniel Jacobowitz
2003-10-14 16:21       ` Joel Brobecker
2003-10-16 22:13         ` Richard Henderson
2003-10-15 19:34       ` Mark Kettenis
2003-10-23  1:07 ` Joel Brobecker
2003-10-23  2:41   ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox