From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28137 invoked by alias); 26 Jul 2009 07:17:14 -0000 Received: (qmail 28127 invoked by uid 22791); 26 Jul 2009 07:17:13 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail-bw0-f205.google.com (HELO mail-bw0-f205.google.com) (209.85.218.205) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 26 Jul 2009 07:17:02 +0000 Received: by bwz1 with SMTP id 1so2037110bwz.24 for ; Sun, 26 Jul 2009 00:16:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.168.5 with SMTP id v5mr2450103muo.12.1248592619054; Sun, 26 Jul 2009 00:16:59 -0700 (PDT) In-Reply-To: <74fef6df0907251005y1c02246ay9b6b1bd7e7c326d3@mail.gmail.com> References: <74fef6df0907251005y1c02246ay9b6b1bd7e7c326d3@mail.gmail.com> Date: Sun, 26 Jul 2009 07:17:00 -0000 Message-ID: <74fef6df0907260016v26adc11bgacca58cf10d5a6c@mail.gmail.com> Subject: Re: random gdb errors: corruption of nptl_db event buffers ? From: Mathieu Lacage To: gdb@sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2009-07/txt/msg00208.txt.bz2 In case someone ever gets the same kind of error, more long hours of debugging show that, indeed, this kind of error message comes from the inferior clobbering one of its thread descriptors, and thus, clobbering the descriptor's nptl_db event buffer and thus confusing gdb. Fun. Mathieu On Sat, Jul 25, 2009 at 7:05 PM, Mathieu Lacage w= rote: > hi, > > I am trying to debug some random errors I get from gdb while debugging > my program. No, the bug is not in gdb, it's in the inferior process > but it appears that the inferior process is confusing ntpl_db beyond > repair. All of this is threading related (hence, most likely, the > random errors/crashes I observe). My program appears to be crashing > somewhere in the code of a newly-created thread. Sometimes, it's the > master thread join which crashes... > > On the gdb side, I get many different kinds of errors, all of which > appear to the related to nptl_db events. Sometimes, gdb gets TD_DBERR > from td_ta_event_getmsg. Sometimes, gdb something like the backtrace > below which I interpret as gdb getting an invalid TD_CREATE event from > the inferior, (potentially, with an invalid thread pointer). > > the error message: > > warning: Can't attach LWP -136336208: No such process > ../../gdb/linux-thread-db.c:288: internal-error: > thread_get_info_callback: Assertion `thread_info !=3D NULL' failed. > > the associated backtrace: > > #0 =C2=A00x0000003f5d8d1320 in __read_nocancel () from /lib64/libc.so.6 > #1 =C2=A00x0000003f5d8717f8 in _IO_new_file_underflow (fp=3D0x3f5db686a0)= at > fileops.c:598 > #2 =C2=A00x0000003f5d87328e in _IO_default_uflow (fp=3D0x0) at genops.c:4= 40 > #3 =C2=A00x0000003f5d86e82b in _IO_getc (fp=3D0x3f5db686a0) at getc.c:41 > #4 =C2=A00x0000000000453502 in defaulted_query (ctlstr=3D out>, defchar=3D0x0, > =C2=A0 =C2=A0args=3D) at ../../gdb/utils.c:1480 > #5 =C2=A00x00000000004536ad in query (ctlstr=3D0x0) at ../../gdb/utils.c:= 1571 > #6 =C2=A00x0000000000453866 in internal_vproblem (problem=3D0x9e6840, > file=3D, > =C2=A0 =C2=A0line=3D, fmt=3D, a= p=3D optimized out>) > =C2=A0 =C2=A0at ../../gdb/utils.c:935 > #7 =C2=A00x0000000000453919 in internal_verror (file=3D, > line=3D, > =C2=A0 =C2=A0fmt=3D, ap=3D0x3f5db69e10) at ../../gdb= /utils.c:986 > #8 =C2=A00x00000000004539b1 in internal_error (file=3D0x0, line=3D0x568b8= 000, > =C2=A0 =C2=A0string=3D0x400
) at ../../gdb/u= tils.c:995 > #9 =C2=A00x0000000000479335 in thread_get_info_callback > (thp=3D0x7fffaddb3720, infop=3D0x7fffaddb3738) > =C2=A0 =C2=A0at ../../gdb/linux-thread-db.c:288 > #10 0x0000000000479e0e in thread_from_lwp (ptid=3D{pid =3D 0x20f5, lwp =3D > 0x20f6, tid =3D 0x0}) > =C2=A0 =C2=A0at ../../gdb/linux-thread-db.c:329 > #11 thread_db_wait (ptid=3D{pid =3D 0x20f5, lwp =3D 0x20f6, tid =3D 0x0})= at > ../../gdb/linux-thread-db.c:922 > #12 0x00000000005339aa in target_wait (ptid=3D{pid =3D 0xffffffff, lwp =3D > 0x0, tid =3D 0x0}, > =C2=A0 =C2=A0status=3D0x7fffaddb3850) at ../../gdb/target.c:1889 > #13 0x000000000050f42e in wait_for_inferior > (treat_exec_as_sigtrap=3D0x0) at ../../gdb/infrun.c:1850 > #14 0x000000000050f95f in proceed (addr=3D, > siggnal=3DTARGET_SIGNAL_DEFAULT, step=3D0x0) > =C2=A0 =C2=A0at ../../gdb/infrun.c:1479 > #15 0x0000000000506738 in continue_command (args=3D0x0, from_tty=3D0x1) at > ../../gdb/infcmd.c:745 > #16 0x0000000000451969 in execute_command (p=3D0x1590101 "", > from_tty=3D0x1) at ../../gdb/top.c:450 > #17 0x000000000051c2f5 in command_handler (command=3D0x1590100 "c") at > ../../gdb/event-top.c:519 > #18 0x000000000051cfbc in command_line_handler (rl=3D out>) at ../../gdb/event-top.c:744 > #19 0x0000003f5fc27e2c in rl_callback_read_char () at ../callback.c:205 > #20 0x000000000051c439 in rl_callback_read_char_wrapper > (client_data=3D0x0) at ../../gdb/event-top.c:179 > #21 0x000000000051ad98 in process_event () at ../../gdb/event-loop.c:394 > #22 0x000000000051bf8a in gdb_do_one_event (data=3D out>) at ../../gdb/event-loop.c:459 > #23 0x00000000005160bb in catch_errors (func=3D, > func_args=3D, > =C2=A0 =C2=A0errstring=3D, mask=3D) at > ../../gdb/exceptions.c:516 > #24 0x00000000004a6ce8 in tui_command_loop (data=3D) > =C2=A0 =C2=A0at ../../gdb/tui/tui-interp.c:156 > #25 0x00000000004449c9 in captured_command_loop (data=3D0x0) at > ../../gdb/main.c:183 > #26 0x00000000005160bb in catch_errors (func=3D, > func_args=3D, > =C2=A0 =C2=A0errstring=3D, mask=3D) at > ../../gdb/exceptions.c:516 > #27 0x000000000044533e in captured_main (data=3D) > at ../../gdb/main.c:989 > #28 0x00000000005160bb in catch_errors (func=3D, > func_args=3D, > =C2=A0 =C2=A0errstring=3D, mask=3D) at > ../../gdb/exceptions.c:516 > #29 0x00000000004449b4 in gdb_main (args=3D0x7ff9568b8000) at ../../gdb/m= ain.c:999 > #30 0x0000000000444989 in main (argc=3D, > argv=3D0x7ff9568b8000) at ../../gdb/gdb.c:47 > > Anyway, I have spent a considerable amount of time trying to trace > what is going on, and trying to figure out how I can possibly inflict > so much pain to gdb from the inferior process but I am still out of > luck so, maybe someone who have more experience with debugging nptl or > ntpl_db could give me a hint as to what I could be doing so badly. > Would really corrupting the inferior process' nptl event buffer > trigger this kind of crap ? > > Mathieu > -- > Mathieu Lacage > --=20 Mathieu Lacage