Strange segfaults of gdb

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* Strange segfaults of gdb
@ 2002-04-11  9:12 Michal Ludvig
  2002-04-11 14:43 ` Michael Snyder
  2002-04-12  2:13 ` Eli Zaretskii
  0 siblings, 2 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-04-11  9:12 UTC (permalink / raw)
  To: gdb

Hi all,
I've spent several days with chasing gdb segfaults on x86-64 but had no 
luck. So I'm finally asking here for any suggestions, opinions, hints, 
just anything that could move me forward.
The problem is, that when I print anything using a 'print' command, or 
info or maybe some others, and then want to run or step the debugged 
program, the gdb segfaults:

# ./gdb ~/mludvig/tst/xmmtest
GNU gdb 2002-04-04-cvs
[...]
This GDB was configured as "x86_64-unknown-linux-gnu"...
Setting up the environment for debugging gdb.
.gdbinit:3: Error in sourced command file:
Function "internal_error" not defined.
(gdb) br 10
Breakpoint 1 at 0x4004d8: file xmmtest.c, line 10.
(gdb) r
Starting program: /root/mludvig/tst/xmmtest

Breakpoint 1, main () at xmmtest.c:10
10              printf("v1=%f, v2=%f, v3=%e\n", v1, v2, v3);
(gdb) p 1
$1 = 1
(gdb) c
Continuing.
Segmentation fault (core dumped)

It doesn't matter which program I run, what I want to print and if I 
then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file 
doesn't give any reasonable informations.
This segfault also happens when I leave 'set complaints 1' in .gdbinit 
in sourcedir, run gdb from there and then try to run a debugged program. 
  Unfortunately it is perfectly reproductable :-(

Does anybody have an idea how print, set and step can be related?
I really don't know...

Thanks for any ideas

Michal Ludvig
-- 
* SuSE CR, s.r.o     * mludvig@suse.cz
* +420 2 9654 5373   * http://www.suse.cz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
  2002-04-11  9:12 Strange segfaults of gdb Michal Ludvig
@ 2002-04-11 14:43 ` Michael Snyder
  2002-04-12  2:13 ` Eli Zaretskii
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Snyder @ 2002-04-11 14:43 UTC (permalink / raw)
  To: Michal Ludvig; +Cc: gdb

Michal Ludvig wrote:
> 
> Hi all,
> I've spent several days with chasing gdb segfaults on x86-64 but had no
> luck. So I'm finally asking here for any suggestions, opinions, hints,
> just anything that could move me forward.
> The problem is, that when I print anything using a 'print' command, or
> info or maybe some others, and then want to run or step the debugged
> program, the gdb segfaults:
> 
> # ./gdb ~/mludvig/tst/xmmtest
> GNU gdb 2002-04-04-cvs
> [...]
> This GDB was configured as "x86_64-unknown-linux-gnu"...
> Setting up the environment for debugging gdb.
> gdbinit:3: Error in sourced command file:
> Function "internal_error" not defined.
> (gdb) br 10
> Breakpoint 1 at 0x4004d8: file xmmtest.c, line 10.
> (gdb) r
> Starting program: /root/mludvig/tst/xmmtest
> 
> Breakpoint 1, main () at xmmtest.c:10
> 10              printf("v1=%f, v2=%f, v3=%e\n", v1, v2, v3);
> (gdb) p 1
> $1 = 1
> (gdb) c
> Continuing.
> Segmentation fault (core dumped)
> 
> It doesn't matter which program I run, what I want to print and if I
> then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file
> doesn't give any reasonable informations.
> This segfault also happens when I leave 'set complaints 1' in .gdbinit
> in sourcedir, run gdb from there and then try to run a debugged program.
>   Unfortunately it is perfectly reproductable :-(
> 
> Does anybody have an idea how print, set and step can be related?
> I really don't know...

I don't actually have any insight into your problem, but
I thought of an interesting way to debug it...

If you have a gdb that is recent enough to include the "gcore" command, 
you could do the following:

1) run gdb under gdb
2) go up to the point just before you say "p 1"
3) generate a corefile of gdb.
4) do the "p 1"
5) generate another corefile of gdb.
6) compare the corefiles, to see what changed.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
  2002-04-11  9:12 Strange segfaults of gdb Michal Ludvig
  2002-04-11 14:43 ` Michael Snyder
@ 2002-04-12  2:13 ` Eli Zaretskii
  2002-04-12  4:27   ` Michal Ludvig
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2002-04-12  2:13 UTC (permalink / raw)
  To: mludvig; +Cc: gdb

> Date: Thu, 11 Apr 2002 18:12:33 +0200
> From: Michal Ludvig <mludvig@suse.cz>
> 
> Breakpoint 1, main () at xmmtest.c:10
> 10              printf("v1=%f, v2=%f, v3=%e\n", v1, v2, v3);
> (gdb) p 1
> $1 = 1
> (gdb) c
> Continuing.
> Segmentation fault (core dumped)
> 
> It doesn't matter which program I run, what I want to print and if I 
> then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file 
> doesn't give any reasonable informations.

You mean, you cannot even tell from the core file where (inside what
function) GDB crashes?  That'd be very strange indeed--what could
prevent you from getting att his information?  Is the core file
corrupt or something?

What if you run GDB under another GDB--can you see where does the
subordinate GDB crash then?

> Does anybody have an idea how print, set and step can be related?

It's very hard to tell without knowing where's the crash happening.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
  2002-04-12  2:13 ` Eli Zaretskii
@ 2002-04-12  4:27   ` Michal Ludvig
  2002-04-12  5:05     ` Eli Zaretskii
  2002-04-16 11:16     ` Michael Snyder
  0 siblings, 2 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-04-12  4:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

Eli Zaretskii wrote:
>>It doesn't matter which program I run, what I want to print and if I 
>>then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file 
>>doesn't give any reasonable informations.
> 
> You mean, you cannot even tell from the core file where (inside what
> function) GDB crashes?  That'd be very strange indeed--what could
> prevent you from getting att his information?  Is the core file
> corrupt or something?

I can see the same information as if I run gdb from gdb. Anyway I treat 
them incorrect [see below].

> What if you run GDB under another GDB--can you see where does the
> subordinate GDB crash then?

(gdb) p 1
$1 = 1
(gdb) r
Starting program: /root/mludvig/tst/xmmtest

Program received signal SIGSEGV, Segmentation fault.
0x2a95ae759c in wait4 () at soinit.c:76
76      }
(top-gdb) disassemble 0x2a95ae759c
Dump of assembler code for function wait4:
0x2a95ae7590 <wait4>:   mov    %rcx,%r10
0x2a95ae7593 <wait4+3>: mov    $0x3d,%rax
0x2a95ae759a <wait4+10>:        syscall
0x2a95ae759c <wait4+12>:        cmp    $0xfffffffffffff001,%rax
0x2a95ae75a2 <wait4+18>:        jae    0x2a95ae75a5 <wait4+21>
0x2a95ae75a4 <wait4+20>:        retq
0x2a95ae75a5 <wait4+21>:        xor    %rdx,%rdx
0x2a95ae75a8 <wait4+24>:        sub    %rax,%rdx
0x2a95ae75ab <wait4+27>:        push   %rdx
0x2a95ae75ac <wait4+28>:        callq  0x2a95a6fa30 <key+145504>
0x2a95ae75b1 <wait4+33>:        pop    %rdx
0x2a95ae75b2 <wait4+34>:        mov    %rdx,(%rax)
0x2a95ae75b5 <wait4+37>:        or     $0xffffffffffffffff,%rax
0x2a95ae75b9 <wait4+41>:        jmp    0x2a95ae75a4 <wait4+20>
0x2a95ae75bb <wait4+43>:        nop
0x2a95ae75bc <wait4+44>:        nop
0x2a95ae75bd <wait4+45>:        nop
0x2a95ae75be <wait4+46>:        nop
0x2a95ae75bf <wait4+47>:        nop
End of assembler dump.

So it appears like the segfault happend on 'cmp <imm>,<reg>' 
instruction, which shouldn't be able to generate any exception at all.
So I don't trust this information.
Or do you have an idea how to interpret it? I don't say it's a bug in 
the gdb - it may be in the kernel, glibc or gcc as well, but everything 
else seems to work. Only gdb doesn't...
May this be a memory corruption problem on the gdb side (perhaps it 
passes a wrong address to the syscall)? I'll try to use ElectricFence to 
see what happens.

Is there somewhere a tutorial on how to examine/compare core files 
generated by gcore command? What should I look for?

It's somehow difficult to debug a broken debugger using a broken 
debugger :-((

Michal Ludvig
-- 
* SuSE CR, s.r.o     * mludvig@suse.cz
* +420 2 9654 5373   * http://www.suse.cz


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
  2002-04-12  4:27   ` Michal Ludvig
@ 2002-04-12  5:05     ` Eli Zaretskii
  2002-04-16 11:16     ` Michael Snyder
  1 sibling, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2002-04-12  5:05 UTC (permalink / raw)
  To: mludvig; +Cc: gdb

> Date: Fri, 12 Apr 2002 13:27:14 +0200
> From: Michal Ludvig <mludvig@suse.cz>
> 
> > What if you run GDB under another GDB--can you see where does the
> > subordinate GDB crash then?
> 
> (gdb) p 1
> $1 = 1
> (gdb) r
> Starting program: /root/mludvig/tst/xmmtest
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x2a95ae759c in wait4 () at soinit.c:76
> 76      }
> (top-gdb) disassemble 0x2a95ae759c

It's more useful to type "bt" at this point.  Then you will know what
kind of code in GDB called wait4.

> Dump of assembler code for function wait4:
> 0x2a95ae7590 <wait4>:   mov    %rcx,%r10
> 0x2a95ae7593 <wait4+3>: mov    $0x3d,%rax
> 0x2a95ae759a <wait4+10>:        syscall
> 0x2a95ae759c <wait4+12>:        cmp    $0xfffffffffffff001,%rax
> 0x2a95ae75a2 <wait4+18>:        jae    0x2a95ae75a5 <wait4+21>
> 0x2a95ae75a4 <wait4+20>:        retq
> 0x2a95ae75a5 <wait4+21>:        xor    %rdx,%rdx
> 0x2a95ae75a8 <wait4+24>:        sub    %rax,%rdx
> 0x2a95ae75ab <wait4+27>:        push   %rdx
> 0x2a95ae75ac <wait4+28>:        callq  0x2a95a6fa30 <key+145504>
> 0x2a95ae75b1 <wait4+33>:        pop    %rdx
> 0x2a95ae75b2 <wait4+34>:        mov    %rdx,(%rax)
> 0x2a95ae75b5 <wait4+37>:        or     $0xffffffffffffffff,%rax
> 0x2a95ae75b9 <wait4+41>:        jmp    0x2a95ae75a4 <wait4+20>
> 0x2a95ae75bb <wait4+43>:        nop
> 0x2a95ae75bc <wait4+44>:        nop
> 0x2a95ae75bd <wait4+45>:        nop
> 0x2a95ae75be <wait4+46>:        nop
> 0x2a95ae75bf <wait4+47>:        nop
> End of assembler dump.
> 

> So it appears like the segfault happend on 'cmp <imm>,<reg>' 
> instruction, which shouldn't be able to generate any exception at all.

I suspect that what crashed is the syscall instruction before that:

> Dump of assembler code for function wait4:
> 0x2a95ae7590 <wait4>:   mov    %rcx,%r10
> 0x2a95ae7593 <wait4+3>: mov    $0x3d,%rax
> 0x2a95ae759a <wait4+10>:        syscall
> 0x2a95ae759c <wait4+12>:        cmp    $0xfffffffffffff001,%rax

It is also possible that the stack is somehow blown up, which would
explain why the first instruction after a syscall return crashes.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
  2002-04-12  4:27   ` Michal Ludvig
  2002-04-12  5:05     ` Eli Zaretskii
@ 2002-04-16 11:16     ` Michael Snyder
  2002-04-17  3:12       ` Michal Ludvig
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Snyder @ 2002-04-16 11:16 UTC (permalink / raw)
  To: Michal Ludvig; +Cc: Eli Zaretskii, gdb

Michal Ludvig wrote:
> 
> Eli Zaretskii wrote:
> >>It doesn't matter which program I run, what I want to print and if I
> >>then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file
> >>doesn't give any reasonable informations.
> >
> > You mean, you cannot even tell from the core file where (inside what
> > function) GDB crashes?  That'd be very strange indeed--what could
> > prevent you from getting att his information?  Is the core file
> > corrupt or something?
> 
> I can see the same information as if I run gdb from gdb. Anyway I treat
> them incorrect [see below].
> 
> > What if you run GDB under another GDB--can you see where does the
> > subordinate GDB crash then?
> 
> (gdb) p 1
> $1 = 1
> (gdb) r
> Starting program: /root/mludvig/tst/xmmtest
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x2a95ae759c in wait4 () at soinit.c:76
> 76      }
> (top-gdb) disassemble 0x2a95ae759c
> Dump of assembler code for function wait4:
> 0x2a95ae7590 <wait4>:   mov    %rcx,%r10
> 0x2a95ae7593 <wait4+3>: mov    $0x3d,%rax
> 0x2a95ae759a <wait4+10>:        syscall
> 0x2a95ae759c <wait4+12>:        cmp    $0xfffffffffffff001,%rax
> 0x2a95ae75a2 <wait4+18>:        jae    0x2a95ae75a5 <wait4+21>
> 0x2a95ae75a4 <wait4+20>:        retq
> 0x2a95ae75a5 <wait4+21>:        xor    %rdx,%rdx
> 0x2a95ae75a8 <wait4+24>:        sub    %rax,%rdx
> 0x2a95ae75ab <wait4+27>:        push   %rdx
> 0x2a95ae75ac <wait4+28>:        callq  0x2a95a6fa30 <key+145504>
> 0x2a95ae75b1 <wait4+33>:        pop    %rdx
> 0x2a95ae75b2 <wait4+34>:        mov    %rdx,(%rax)
> 0x2a95ae75b5 <wait4+37>:        or     $0xffffffffffffffff,%rax
> 0x2a95ae75b9 <wait4+41>:        jmp    0x2a95ae75a4 <wait4+20>
> 0x2a95ae75bb <wait4+43>:        nop
> 0x2a95ae75bc <wait4+44>:        nop
> 0x2a95ae75bd <wait4+45>:        nop
> 0x2a95ae75be <wait4+46>:        nop
> 0x2a95ae75bf <wait4+47>:        nop
> End of assembler dump.
> 
> So it appears like the segfault happend on 'cmp <imm>,<reg>'
> instruction, which shouldn't be able to generate any exception at all.
> So I don't trust this information.

Maybe it took place during the syscall, and was deferred
until return to user space?


> Or do you have an idea how to interpret it? I don't say it's a bug in
> the gdb - it may be in the kernel, glibc or gcc as well, but everything
> else seems to work. Only gdb doesn't...
> May this be a memory corruption problem on the gdb side (perhaps it
> passes a wrong address to the syscall)? I'll try to use ElectricFence to
> see what happens.
> 
> Is there somewhere a tutorial on how to examine/compare core files
> generated by gcore command? What should I look for?

No -- but I was thinking you could just run "cmp" on them, and
find out what memory had changed during the "print 1".


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
  2002-04-16 11:16     ` Michael Snyder
@ 2002-04-17  3:12       ` Michal Ludvig
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-04-17  3:12 UTC (permalink / raw)
  To: Michael Snyder; +Cc: Eli Zaretskii, gdb

Michael Snyder wrote:
> Michal Ludvig wrote:
>>So it appears like the segfault happend on 'cmp <imm>,<reg>'
>>instruction, which shouldn't be able to generate any exception at all.
>>So I don't trust this information.
> 
> Maybe it took place during the syscall, and was deferred
> until return to user space?

Yes, it was a kind of this problem that by a strange coincidence 
apperaed only in gdb. Fortunately I cannot confirm it was a gdb's 
problem and the version in 5.2 branch should be ready for release (from 
x86-64's side).

Michal Ludvig
-- 
* SuSE CR, s.r.o     * mludvig@suse.cz
* +420 2 9654 5373   * http://www.suse.cz


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
       [not found]   ` <1039817373.10496.19.camel@eggis1>
@ 2002-12-14  6:09     ` Michal Ludvig
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-12-14  6:09 UTC (permalink / raw)
  To: Terje Eggestad; +Cc: gdb

Terje Eggestad wrote:
> One other detail, when you get a seg fault, or just simply break on a 
> symbol in libc, like strlen(), (first noticed it b/c a seg fault in 
> strlen) neither bt or where give the stack trace... That's a bug, right?

Well, it depends on the point of view :-)
These functions don't have a debug info, because they are pure assembler 
functions not written in C (applies to memcpy, str*, and some others, as 
well as for syscall wrappers). Without a valid Dwarf2 information GDB 
can't do a backtrace. However I don't treat this as a GDB bug, but 
anyway I'm implementing a workaround for this, because I know it's 
annoying :-)

Michal Ludvig
-- 
* SuSE CR, s.r.o     * mludvig@suse.cz
* (+420) 296.545.373 * http://www.suse.cz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
  2002-12-12 16:36 Terje Eggestad
@ 2002-12-13  9:59 ` Michal Ludvig
       [not found]   ` <1039817373.10496.19.camel@eggis1>
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Ludvig @ 2002-12-13  9:59 UTC (permalink / raw)
  To: Terje Eggestad; +Cc: gdb

Terje Eggestad wrote:
> It seems that I can reliably reproduce it when you link with pthread.
> See below.
> 
> It seem that it segfaults on the first or second instruction *byte*, NOT
> the next instruction... (according to info registers, just before and
> after attempted single step.) 
> 
> Anyone know whom to report this to?

Please update to gdb-5.3 first and try to reproduce it. Either compile 
the new GDB yourself or fetch x86-64 SuSE 8.1 binary from 
http://tmp.logix.cz/gdb

If it won't help, tell me and I'll look on it.

Michal Ludvig
-- 
* SuSE CR, s.r.o     * mludvig@suse.cz
* (+420) 296.545.373 * http://www.suse.cz


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange segfaults of gdb
@ 2002-12-12 16:36 Terje Eggestad
  2002-12-13  9:59 ` Michal Ludvig
  0 siblings, 1 reply; 10+ messages in thread
From: Terje Eggestad @ 2002-12-12 16:36 UTC (permalink / raw)
  To: Michal Ludvig, Eli Zaretskii, Michael Snyder; +Cc: gdb

Hi 

Back in April you guys had a short discussion on strange segfaults in
gdb:
http://sources.redhat.com/ml/gdb/2002-04/msg00168.html

It seems that I can reliably reproduce it when you link with pthread.
See below.

It seem that it segfaults on the first or second instruction *byte*, NOT
the next instruction... (according to info registers, just before and
after attempted single step.) 

Anyone know whom to report this to?

Running SuSE 8.0.99 beta with the latest gcc,gdb,glibc updates.

Terje





 
te mjollnir testdl 134> cat main.c
main()
{
        int i;

        i = 2;
        i += 4;
        i *= 3;
        exit(i);
};

te mjollnir testdl 135> gcc -v
Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.1/specs
Configured with: ../configure --enable-threads=posix --prefix=/usr
--with-local-prefix=/usr/local --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib64
--enable-languages=c,c++,f77,objc,java,ada --enable-libgcj
--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib
--with-system-zlib --enable-shared --enable-__cxa_atexit
x86_64-suse-linux
Thread model: posix
gcc version 3.2.1 20021002 (prerelease) (SuSE Linux)
te mjollnir testdl 136> gcc -g -o main main.c
te mjollnir testdl 137> gdb main
GNU gdb 5.2.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "x86_64-suse-linux"...
(gdb) break main
Breakpoint 1 at 0x400460: file main.c, line 5.
(gdb) run
Starting program: /home/te/test/testdl/main 

Breakpoint 1, main () at main.c:5
5               i = 2;
(gdb) n
6               i += 4;
(gdb) 
7               i *= 3;
(gdb) 
8               exit(i);
(gdb) 

Program exited with code 022.
(gdb) q
te mjollnir testdl 138> gcc -g -o main main.c -lpthread
te mjollnir testdl 139> gdb -q main
(gdb) break main
Breakpoint 1 at 0x4004a0: file main.c, line 5.
(gdb) run
Starting program: /home/te/test/testdl/main 
[New Thread 1024 (LWP 6458)]
[Switching to Thread 1024 (LWP 6458)]

Breakpoint 1, 0x004004a1 in main () at main.c:5
5               i = 2;
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
0x004004a3 in main () at main.c:5
5               i = 2;
(gdb) q
The program is running.  Exit anyway? (y or n) y
te mjollnir testdl 140> 




-- 
_________________________________________________________________________

Terje Eggestad                  mailto:terje.eggestad@scali.no
Scali Scalable Linux Systems    http://www.scali.com

Olaf Helsets Vei 6              tel:    +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal                     +47 975 31 574  (MOBILE)
N-0619 Oslo                     fax:    +47 22 62 89 51
NORWAY            
_________________________________________________________________________


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-12-14 14:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-11  9:12 Strange segfaults of gdb Michal Ludvig
2002-04-11 14:43 ` Michael Snyder
2002-04-12  2:13 ` Eli Zaretskii
2002-04-12  4:27   ` Michal Ludvig
2002-04-12  5:05     ` Eli Zaretskii
2002-04-16 11:16     ` Michael Snyder
2002-04-17  3:12       ` Michal Ludvig
2002-12-12 16:36 Terje Eggestad
2002-12-13  9:59 ` Michal Ludvig
     [not found]   ` <1039817373.10496.19.camel@eggis1>
2002-12-14  6:09     ` Michal Ludvig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox