problem unwinding past pthread_cond_wait() on x86 RedHat 9.0

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Joel Brobecker <brobecker@gnat.com>
To: gdb-patches@sources.redhat.com
Subject: problem unwinding past pthread_cond_wait() on x86 RedHat 9.0
Date: Tue, 14 Oct 2003 05:42:00 -0000	[thread overview]
Message-ID: <20031014054225.GB919@gnat.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4788 bytes --]

Hello,

while trying to move from GDB 5.3 to 6.0, we noticed a small
"regression" in a backtrace after switching to a thread blocked
on a pthread_cond_wait() call. This occurs only on RH9 (we tried
on RH8 and RH7).

To reproduce the problem, compile the attached C program with
(I'll be more than happy to contribute that testcase if you are
interested. Coming from the Ada world where tasking is really easy,
I am not too familiar with pthreads, especially in terms of portability,
but I welcome critics :)

        % gcc -D_REENTRANT -g -o pt pt.c -lpthread

Then do the following:

        % gdb pt
        (gdb) break break_me
        (gdb) run
        (gdb) thread 2
        (gdb) bt

The thread that I created should be blocked on a pthread_cond_wait
waiting for a condition to be signaled. But we never signal this
condition, so it should wait there forever. So the backtrace I
expected should look like this:

        #0 in pthread_cond_wait ()
        #1 in cond_wait ()
        #2 in noreturn ()
        #3 in forever_thread ()
        (more frames follow)

Instead, here is all what we get:

        #0  0xffffe002 in ?? ()
        #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
           from /lib/tls/libpthread.so.0

With GDB 5.3, we used to get:

        #0  0xffffe002 in ?? ()
        #1  0x4002b2b6 in start_thread () from /lib/tls/libpthread.so.0

Which isn't any better, and that explains why I quoted "regression" in
the first paragraph. The change of behavior becomes much more negatively
obvious when we debug an Ada program, because instead of the not-so-correct
backtrace we used to get with 5.3 (missing a frame or two between #0 and
#1):

        #0  0xffffe002 in ?? ()
        #1  0x0804fb29 in system__tasking__rendezvous__accept_trivial ()
        #2  0x08049f48 in task_switch.callee (<_task>=0x806e708)
                          at task_switch.adb:29
        #3  0x08053394 in system__tasking__stages__task_wrapper ()
        #4  0x4002b2b6 in start_thread () from /lib/tls/libpthread.so.0

We now basically get almost nothing:

        #0  0xffffe002 in ?? ()
        #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
           from /lib/tls/libpthread.so.0

I think I found the source of the problem when looking at the assembly
code for pthread_cond_wait in libpthread.so. Here is what it looks like:

0x4002d2e0 <pthread_cond_wait+0>:          push   %edi
0x4002d2e1 <pthread_cond_wait+1>:          push   %esi
0x4002d2e2 <pthread_cond_wait+2>:          push   %ebx
[a bunch of instructions, including conditional jumps]
0x4002d2fa <pthread_cond_wait+26>:         pushl  0x14(%esp,1)
[...]
0x4002d324 <pthread_cond_wait+68>: sub    $0x20,%esp
[some other bunch of instructions, and then finally the code were we stopped:]
0x4002d372 <pthread_cond_wait+146>:        call   *%gs:0x10
0x4002d379 <pthread_cond_wait+153>:        sub    $0xc,%ebx

So we are at pthread_cond_wait+146, and the i386 frame code is trying to
unwind past this function. So it looks at the function prologue, finds
that it is frameless. So it uses the backup plan and is trying to find
the "frame" base using the SP instead of the base pointer. It then
analyzes the prologue and finds the 3 push instructions saving certain
registers, and therefore determines that the offset between the SP and
the BP must be these 12 bytes. Unfortunately, we missed the pushl
and the sub instructions that updated the SP by another 36 bytes!
So eventually the unwinder got the wrong frame base, and therefore
got the wrong address to fetch the saved EIP, which lead the unwinder
to stop because the EIP value fetch was NULL.

I tried an experiment of running the debugger under debugger
suppervision, and I assumed despite the numerous conditions jumps
everywhere that the "pushl" and the "sub" instructions were executed
exactly once. So I manually changed the offset to be 12 + 4 + 32 = 48
(so cache->offset was 44), and voila!

    #0  0xffffe002 in ?? ()
    #1  0x4002d379 in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/tls/libpthread.so.0
    #2  0x0804855e in cond_wait (cond=0x4083484c, mut=0x4083487c) at pt.c:9
    #3  0x080485a9 in noreturn () at pt.c:24
    #4  0x080485b9 in forever_pthread (unused=0x0) at pt.c:30
    #5  0x4002b2b6 in start_thread () from /lib/tls/libpthread.so.0
    #6  0x420de407 in clone () from /lib/tls/libc.so.6

The problem I am now trying to solve is the following: How can we fix
the i386 unwinder to be smart enough to handle this wicked function?
Is this even possible? The only possibility I see right now is with
dwarf2 CFI, but then the problem I foresee is that we can not help
the people using the stock RH9. If the only hope is with CFI, then
they will have to update their pthread library...

What do you guys think?

-- 
Joel

[-- Attachment #2: pt.c --]
[-- Type: text/plain, Size: 872 bytes --]

#include <pthread.h>
#include <stdio.h>
#include <time.h>

void
cond_wait (pthread_cond_t *cond, pthread_mutex_t *mut)
{
  pthread_mutex_lock(mut);
  pthread_cond_wait (cond, mut);
  pthread_mutex_unlock (mut);
}

void
noreturn (void)
{
  pthread_mutex_t mut;
  pthread_cond_t cond;

  pthread_mutex_init (&mut, NULL);
  pthread_cond_init (&cond, NULL);

  /* Wait for a condition that will never be signaled, so we effectively
     block the thread here.  */
  cond_wait (&cond, &mut);
}

void *
forever_pthread (void *unused)
{
  noreturn ();
}

void
break_me (void)
{
  /* Just an anchor to help putting a breakpoint.  */
}

int
main (void)
{
  pthread_t forever;
  const struct timespec ts = { 0, 10000000 }; /* 0.01 sec */

  pthread_create (&forever, NULL, forever_pthread, NULL);
  for (;;)
    {
      nanosleep (&ts, NULL);
      break_me();
    }

  return 0;
}

next             reply	other threads:[~2003-10-14  5:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-14  5:42 Joel Brobecker [this message]
2003-10-14 12:57 ` Daniel Jacobowitz
2003-10-14 15:24   ` Andrew Cagney
2003-10-14 15:46     ` Joel Brobecker
2003-10-14 15:52       ` Daniel Jacobowitz
2003-10-14 16:15         ` Andrew Cagney
2003-10-14 16:18           ` Daniel Jacobowitz
2003-10-14 16:19           ` Joel Brobecker
2003-10-14 15:53       ` Elena Zannoni
2003-10-14 15:58   ` Joel Brobecker
2003-10-14 16:02     ` Daniel Jacobowitz
2003-10-14 16:21       ` Joel Brobecker
2003-10-16 22:13         ` Richard Henderson
2003-10-15 19:34       ` Mark Kettenis
2003-10-23  1:07 ` Joel Brobecker
2003-10-23  2:41   ` Daniel Jacobowitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031014054225.GB919@gnat.com \
    --to=brobecker@gnat.com \
    --cc=gdb-patches@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox