* Strange segfaults of gdb
@ 2002-04-11 9:12 Michal Ludvig
2002-04-11 14:43 ` Michael Snyder
2002-04-12 2:13 ` Eli Zaretskii
0 siblings, 2 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-04-11 9:12 UTC (permalink / raw)
To: gdb
Hi all,
I've spent several days with chasing gdb segfaults on x86-64 but had no
luck. So I'm finally asking here for any suggestions, opinions, hints,
just anything that could move me forward.
The problem is, that when I print anything using a 'print' command, or
info or maybe some others, and then want to run or step the debugged
program, the gdb segfaults:
# ./gdb ~/mludvig/tst/xmmtest
GNU gdb 2002-04-04-cvs
[...]
This GDB was configured as "x86_64-unknown-linux-gnu"...
Setting up the environment for debugging gdb.
.gdbinit:3: Error in sourced command file:
Function "internal_error" not defined.
(gdb) br 10
Breakpoint 1 at 0x4004d8: file xmmtest.c, line 10.
(gdb) r
Starting program: /root/mludvig/tst/xmmtest
Breakpoint 1, main () at xmmtest.c:10
10 printf("v1=%f, v2=%f, v3=%e\n", v1, v2, v3);
(gdb) p 1
$1 = 1
(gdb) c
Continuing.
Segmentation fault (core dumped)
It doesn't matter which program I run, what I want to print and if I
then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file
doesn't give any reasonable informations.
This segfault also happens when I leave 'set complaints 1' in .gdbinit
in sourcedir, run gdb from there and then try to run a debugged program.
Unfortunately it is perfectly reproductable :-(
Does anybody have an idea how print, set and step can be related?
I really don't know...
Thanks for any ideas
Michal Ludvig
--
* SuSE CR, s.r.o * mludvig@suse.cz
* +420 2 9654 5373 * http://www.suse.cz
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
2002-04-11 9:12 Strange segfaults of gdb Michal Ludvig
@ 2002-04-11 14:43 ` Michael Snyder
2002-04-12 2:13 ` Eli Zaretskii
1 sibling, 0 replies; 10+ messages in thread
From: Michael Snyder @ 2002-04-11 14:43 UTC (permalink / raw)
To: Michal Ludvig; +Cc: gdb
Michal Ludvig wrote:
>
> Hi all,
> I've spent several days with chasing gdb segfaults on x86-64 but had no
> luck. So I'm finally asking here for any suggestions, opinions, hints,
> just anything that could move me forward.
> The problem is, that when I print anything using a 'print' command, or
> info or maybe some others, and then want to run or step the debugged
> program, the gdb segfaults:
>
> # ./gdb ~/mludvig/tst/xmmtest
> GNU gdb 2002-04-04-cvs
> [...]
> This GDB was configured as "x86_64-unknown-linux-gnu"...
> Setting up the environment for debugging gdb.
> gdbinit:3: Error in sourced command file:
> Function "internal_error" not defined.
> (gdb) br 10
> Breakpoint 1 at 0x4004d8: file xmmtest.c, line 10.
> (gdb) r
> Starting program: /root/mludvig/tst/xmmtest
>
> Breakpoint 1, main () at xmmtest.c:10
> 10 printf("v1=%f, v2=%f, v3=%e\n", v1, v2, v3);
> (gdb) p 1
> $1 = 1
> (gdb) c
> Continuing.
> Segmentation fault (core dumped)
>
> It doesn't matter which program I run, what I want to print and if I
> then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file
> doesn't give any reasonable informations.
> This segfault also happens when I leave 'set complaints 1' in .gdbinit
> in sourcedir, run gdb from there and then try to run a debugged program.
> Unfortunately it is perfectly reproductable :-(
>
> Does anybody have an idea how print, set and step can be related?
> I really don't know...
I don't actually have any insight into your problem, but
I thought of an interesting way to debug it...
If you have a gdb that is recent enough to include the "gcore" command,
you could do the following:
1) run gdb under gdb
2) go up to the point just before you say "p 1"
3) generate a corefile of gdb.
4) do the "p 1"
5) generate another corefile of gdb.
6) compare the corefiles, to see what changed.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
2002-04-11 9:12 Strange segfaults of gdb Michal Ludvig
2002-04-11 14:43 ` Michael Snyder
@ 2002-04-12 2:13 ` Eli Zaretskii
2002-04-12 4:27 ` Michal Ludvig
1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2002-04-12 2:13 UTC (permalink / raw)
To: mludvig; +Cc: gdb
> Date: Thu, 11 Apr 2002 18:12:33 +0200
> From: Michal Ludvig <mludvig@suse.cz>
>
> Breakpoint 1, main () at xmmtest.c:10
> 10 printf("v1=%f, v2=%f, v3=%e\n", v1, v2, v3);
> (gdb) p 1
> $1 = 1
> (gdb) c
> Continuing.
> Segmentation fault (core dumped)
>
> It doesn't matter which program I run, what I want to print and if I
> then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file
> doesn't give any reasonable informations.
You mean, you cannot even tell from the core file where (inside what
function) GDB crashes? That'd be very strange indeed--what could
prevent you from getting att his information? Is the core file
corrupt or something?
What if you run GDB under another GDB--can you see where does the
subordinate GDB crash then?
> Does anybody have an idea how print, set and step can be related?
It's very hard to tell without knowing where's the crash happening.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
2002-04-12 2:13 ` Eli Zaretskii
@ 2002-04-12 4:27 ` Michal Ludvig
2002-04-12 5:05 ` Eli Zaretskii
2002-04-16 11:16 ` Michael Snyder
0 siblings, 2 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-04-12 4:27 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb
Eli Zaretskii wrote:
>>It doesn't matter which program I run, what I want to print and if I
>>then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file
>>doesn't give any reasonable informations.
>
> You mean, you cannot even tell from the core file where (inside what
> function) GDB crashes? That'd be very strange indeed--what could
> prevent you from getting att his information? Is the core file
> corrupt or something?
I can see the same information as if I run gdb from gdb. Anyway I treat
them incorrect [see below].
> What if you run GDB under another GDB--can you see where does the
> subordinate GDB crash then?
(gdb) p 1
$1 = 1
(gdb) r
Starting program: /root/mludvig/tst/xmmtest
Program received signal SIGSEGV, Segmentation fault.
0x2a95ae759c in wait4 () at soinit.c:76
76 }
(top-gdb) disassemble 0x2a95ae759c
Dump of assembler code for function wait4:
0x2a95ae7590 <wait4>: mov %rcx,%r10
0x2a95ae7593 <wait4+3>: mov $0x3d,%rax
0x2a95ae759a <wait4+10>: syscall
0x2a95ae759c <wait4+12>: cmp $0xfffffffffffff001,%rax
0x2a95ae75a2 <wait4+18>: jae 0x2a95ae75a5 <wait4+21>
0x2a95ae75a4 <wait4+20>: retq
0x2a95ae75a5 <wait4+21>: xor %rdx,%rdx
0x2a95ae75a8 <wait4+24>: sub %rax,%rdx
0x2a95ae75ab <wait4+27>: push %rdx
0x2a95ae75ac <wait4+28>: callq 0x2a95a6fa30 <key+145504>
0x2a95ae75b1 <wait4+33>: pop %rdx
0x2a95ae75b2 <wait4+34>: mov %rdx,(%rax)
0x2a95ae75b5 <wait4+37>: or $0xffffffffffffffff,%rax
0x2a95ae75b9 <wait4+41>: jmp 0x2a95ae75a4 <wait4+20>
0x2a95ae75bb <wait4+43>: nop
0x2a95ae75bc <wait4+44>: nop
0x2a95ae75bd <wait4+45>: nop
0x2a95ae75be <wait4+46>: nop
0x2a95ae75bf <wait4+47>: nop
End of assembler dump.
So it appears like the segfault happend on 'cmp <imm>,<reg>'
instruction, which shouldn't be able to generate any exception at all.
So I don't trust this information.
Or do you have an idea how to interpret it? I don't say it's a bug in
the gdb - it may be in the kernel, glibc or gcc as well, but everything
else seems to work. Only gdb doesn't...
May this be a memory corruption problem on the gdb side (perhaps it
passes a wrong address to the syscall)? I'll try to use ElectricFence to
see what happens.
Is there somewhere a tutorial on how to examine/compare core files
generated by gcore command? What should I look for?
It's somehow difficult to debug a broken debugger using a broken
debugger :-((
Michal Ludvig
--
* SuSE CR, s.r.o * mludvig@suse.cz
* +420 2 9654 5373 * http://www.suse.cz
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
2002-04-12 4:27 ` Michal Ludvig
@ 2002-04-12 5:05 ` Eli Zaretskii
2002-04-16 11:16 ` Michael Snyder
1 sibling, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2002-04-12 5:05 UTC (permalink / raw)
To: mludvig; +Cc: gdb
> Date: Fri, 12 Apr 2002 13:27:14 +0200
> From: Michal Ludvig <mludvig@suse.cz>
>
> > What if you run GDB under another GDB--can you see where does the
> > subordinate GDB crash then?
>
> (gdb) p 1
> $1 = 1
> (gdb) r
> Starting program: /root/mludvig/tst/xmmtest
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x2a95ae759c in wait4 () at soinit.c:76
> 76 }
> (top-gdb) disassemble 0x2a95ae759c
It's more useful to type "bt" at this point. Then you will know what
kind of code in GDB called wait4.
> Dump of assembler code for function wait4:
> 0x2a95ae7590 <wait4>: mov %rcx,%r10
> 0x2a95ae7593 <wait4+3>: mov $0x3d,%rax
> 0x2a95ae759a <wait4+10>: syscall
> 0x2a95ae759c <wait4+12>: cmp $0xfffffffffffff001,%rax
> 0x2a95ae75a2 <wait4+18>: jae 0x2a95ae75a5 <wait4+21>
> 0x2a95ae75a4 <wait4+20>: retq
> 0x2a95ae75a5 <wait4+21>: xor %rdx,%rdx
> 0x2a95ae75a8 <wait4+24>: sub %rax,%rdx
> 0x2a95ae75ab <wait4+27>: push %rdx
> 0x2a95ae75ac <wait4+28>: callq 0x2a95a6fa30 <key+145504>
> 0x2a95ae75b1 <wait4+33>: pop %rdx
> 0x2a95ae75b2 <wait4+34>: mov %rdx,(%rax)
> 0x2a95ae75b5 <wait4+37>: or $0xffffffffffffffff,%rax
> 0x2a95ae75b9 <wait4+41>: jmp 0x2a95ae75a4 <wait4+20>
> 0x2a95ae75bb <wait4+43>: nop
> 0x2a95ae75bc <wait4+44>: nop
> 0x2a95ae75bd <wait4+45>: nop
> 0x2a95ae75be <wait4+46>: nop
> 0x2a95ae75bf <wait4+47>: nop
> End of assembler dump.
>
> So it appears like the segfault happend on 'cmp <imm>,<reg>'
> instruction, which shouldn't be able to generate any exception at all.
I suspect that what crashed is the syscall instruction before that:
> Dump of assembler code for function wait4:
> 0x2a95ae7590 <wait4>: mov %rcx,%r10
> 0x2a95ae7593 <wait4+3>: mov $0x3d,%rax
> 0x2a95ae759a <wait4+10>: syscall
> 0x2a95ae759c <wait4+12>: cmp $0xfffffffffffff001,%rax
It is also possible that the stack is somehow blown up, which would
explain why the first instruction after a syscall return crashes.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
2002-04-12 4:27 ` Michal Ludvig
2002-04-12 5:05 ` Eli Zaretskii
@ 2002-04-16 11:16 ` Michael Snyder
2002-04-17 3:12 ` Michal Ludvig
1 sibling, 1 reply; 10+ messages in thread
From: Michael Snyder @ 2002-04-16 11:16 UTC (permalink / raw)
To: Michal Ludvig; +Cc: Eli Zaretskii, gdb
Michal Ludvig wrote:
>
> Eli Zaretskii wrote:
> >>It doesn't matter which program I run, what I want to print and if I
> >>then want invoke 'run', 'continue' or even 'si'. It segfaults. Core file
> >>doesn't give any reasonable informations.
> >
> > You mean, you cannot even tell from the core file where (inside what
> > function) GDB crashes? That'd be very strange indeed--what could
> > prevent you from getting att his information? Is the core file
> > corrupt or something?
>
> I can see the same information as if I run gdb from gdb. Anyway I treat
> them incorrect [see below].
>
> > What if you run GDB under another GDB--can you see where does the
> > subordinate GDB crash then?
>
> (gdb) p 1
> $1 = 1
> (gdb) r
> Starting program: /root/mludvig/tst/xmmtest
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x2a95ae759c in wait4 () at soinit.c:76
> 76 }
> (top-gdb) disassemble 0x2a95ae759c
> Dump of assembler code for function wait4:
> 0x2a95ae7590 <wait4>: mov %rcx,%r10
> 0x2a95ae7593 <wait4+3>: mov $0x3d,%rax
> 0x2a95ae759a <wait4+10>: syscall
> 0x2a95ae759c <wait4+12>: cmp $0xfffffffffffff001,%rax
> 0x2a95ae75a2 <wait4+18>: jae 0x2a95ae75a5 <wait4+21>
> 0x2a95ae75a4 <wait4+20>: retq
> 0x2a95ae75a5 <wait4+21>: xor %rdx,%rdx
> 0x2a95ae75a8 <wait4+24>: sub %rax,%rdx
> 0x2a95ae75ab <wait4+27>: push %rdx
> 0x2a95ae75ac <wait4+28>: callq 0x2a95a6fa30 <key+145504>
> 0x2a95ae75b1 <wait4+33>: pop %rdx
> 0x2a95ae75b2 <wait4+34>: mov %rdx,(%rax)
> 0x2a95ae75b5 <wait4+37>: or $0xffffffffffffffff,%rax
> 0x2a95ae75b9 <wait4+41>: jmp 0x2a95ae75a4 <wait4+20>
> 0x2a95ae75bb <wait4+43>: nop
> 0x2a95ae75bc <wait4+44>: nop
> 0x2a95ae75bd <wait4+45>: nop
> 0x2a95ae75be <wait4+46>: nop
> 0x2a95ae75bf <wait4+47>: nop
> End of assembler dump.
>
> So it appears like the segfault happend on 'cmp <imm>,<reg>'
> instruction, which shouldn't be able to generate any exception at all.
> So I don't trust this information.
Maybe it took place during the syscall, and was deferred
until return to user space?
> Or do you have an idea how to interpret it? I don't say it's a bug in
> the gdb - it may be in the kernel, glibc or gcc as well, but everything
> else seems to work. Only gdb doesn't...
> May this be a memory corruption problem on the gdb side (perhaps it
> passes a wrong address to the syscall)? I'll try to use ElectricFence to
> see what happens.
>
> Is there somewhere a tutorial on how to examine/compare core files
> generated by gcore command? What should I look for?
No -- but I was thinking you could just run "cmp" on them, and
find out what memory had changed during the "print 1".
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
2002-04-16 11:16 ` Michael Snyder
@ 2002-04-17 3:12 ` Michal Ludvig
0 siblings, 0 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-04-17 3:12 UTC (permalink / raw)
To: Michael Snyder; +Cc: Eli Zaretskii, gdb
Michael Snyder wrote:
> Michal Ludvig wrote:
>>So it appears like the segfault happend on 'cmp <imm>,<reg>'
>>instruction, which shouldn't be able to generate any exception at all.
>>So I don't trust this information.
>
> Maybe it took place during the syscall, and was deferred
> until return to user space?
Yes, it was a kind of this problem that by a strange coincidence
apperaed only in gdb. Fortunately I cannot confirm it was a gdb's
problem and the version in 5.2 branch should be ready for release (from
x86-64's side).
Michal Ludvig
--
* SuSE CR, s.r.o * mludvig@suse.cz
* +420 2 9654 5373 * http://www.suse.cz
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
[not found] ` <1039817373.10496.19.camel@eggis1>
@ 2002-12-14 6:09 ` Michal Ludvig
0 siblings, 0 replies; 10+ messages in thread
From: Michal Ludvig @ 2002-12-14 6:09 UTC (permalink / raw)
To: Terje Eggestad; +Cc: gdb
Terje Eggestad wrote:
> One other detail, when you get a seg fault, or just simply break on a
> symbol in libc, like strlen(), (first noticed it b/c a seg fault in
> strlen) neither bt or where give the stack trace... That's a bug, right?
Well, it depends on the point of view :-)
These functions don't have a debug info, because they are pure assembler
functions not written in C (applies to memcpy, str*, and some others, as
well as for syscall wrappers). Without a valid Dwarf2 information GDB
can't do a backtrace. However I don't treat this as a GDB bug, but
anyway I'm implementing a workaround for this, because I know it's
annoying :-)
Michal Ludvig
--
* SuSE CR, s.r.o * mludvig@suse.cz
* (+420) 296.545.373 * http://www.suse.cz
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
2002-12-12 16:36 Terje Eggestad
@ 2002-12-13 9:59 ` Michal Ludvig
[not found] ` <1039817373.10496.19.camel@eggis1>
0 siblings, 1 reply; 10+ messages in thread
From: Michal Ludvig @ 2002-12-13 9:59 UTC (permalink / raw)
To: Terje Eggestad; +Cc: gdb
Terje Eggestad wrote:
> It seems that I can reliably reproduce it when you link with pthread.
> See below.
>
> It seem that it segfaults on the first or second instruction *byte*, NOT
> the next instruction... (according to info registers, just before and
> after attempted single step.)
>
> Anyone know whom to report this to?
Please update to gdb-5.3 first and try to reproduce it. Either compile
the new GDB yourself or fetch x86-64 SuSE 8.1 binary from
http://tmp.logix.cz/gdb
If it won't help, tell me and I'll look on it.
Michal Ludvig
--
* SuSE CR, s.r.o * mludvig@suse.cz
* (+420) 296.545.373 * http://www.suse.cz
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange segfaults of gdb
@ 2002-12-12 16:36 Terje Eggestad
2002-12-13 9:59 ` Michal Ludvig
0 siblings, 1 reply; 10+ messages in thread
From: Terje Eggestad @ 2002-12-12 16:36 UTC (permalink / raw)
To: Michal Ludvig, Eli Zaretskii, Michael Snyder; +Cc: gdb
Hi
Back in April you guys had a short discussion on strange segfaults in
gdb:
http://sources.redhat.com/ml/gdb/2002-04/msg00168.html
It seems that I can reliably reproduce it when you link with pthread.
See below.
It seem that it segfaults on the first or second instruction *byte*, NOT
the next instruction... (according to info registers, just before and
after attempted single step.)
Anyone know whom to report this to?
Running SuSE 8.0.99 beta with the latest gcc,gdb,glibc updates.
Terje
te mjollnir testdl 134> cat main.c
main()
{
int i;
i = 2;
i += 4;
i *= 3;
exit(i);
};
te mjollnir testdl 135> gcc -v
Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.1/specs
Configured with: ../configure --enable-threads=posix --prefix=/usr
--with-local-prefix=/usr/local --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib64
--enable-languages=c,c++,f77,objc,java,ada --enable-libgcj
--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib
--with-system-zlib --enable-shared --enable-__cxa_atexit
x86_64-suse-linux
Thread model: posix
gcc version 3.2.1 20021002 (prerelease) (SuSE Linux)
te mjollnir testdl 136> gcc -g -o main main.c
te mjollnir testdl 137> gdb main
GNU gdb 5.2.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "x86_64-suse-linux"...
(gdb) break main
Breakpoint 1 at 0x400460: file main.c, line 5.
(gdb) run
Starting program: /home/te/test/testdl/main
Breakpoint 1, main () at main.c:5
5 i = 2;
(gdb) n
6 i += 4;
(gdb)
7 i *= 3;
(gdb)
8 exit(i);
(gdb)
Program exited with code 022.
(gdb) q
te mjollnir testdl 138> gcc -g -o main main.c -lpthread
te mjollnir testdl 139> gdb -q main
(gdb) break main
Breakpoint 1 at 0x4004a0: file main.c, line 5.
(gdb) run
Starting program: /home/te/test/testdl/main
[New Thread 1024 (LWP 6458)]
[Switching to Thread 1024 (LWP 6458)]
Breakpoint 1, 0x004004a1 in main () at main.c:5
5 i = 2;
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x004004a3 in main () at main.c:5
5 i = 2;
(gdb) q
The program is running. Exit anyway? (y or n) y
te mjollnir testdl 140>
--
_________________________________________________________________________
Terje Eggestad mailto:terje.eggestad@scali.no
Scali Scalable Linux Systems http://www.scali.com
Olaf Helsets Vei 6 tel: +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal +47 975 31 574 (MOBILE)
N-0619 Oslo fax: +47 22 62 89 51
NORWAY
_________________________________________________________________________
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2002-12-14 14:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-11 9:12 Strange segfaults of gdb Michal Ludvig
2002-04-11 14:43 ` Michael Snyder
2002-04-12 2:13 ` Eli Zaretskii
2002-04-12 4:27 ` Michal Ludvig
2002-04-12 5:05 ` Eli Zaretskii
2002-04-16 11:16 ` Michael Snyder
2002-04-17 3:12 ` Michal Ludvig
2002-12-12 16:36 Terje Eggestad
2002-12-13 9:59 ` Michal Ludvig
[not found] ` <1039817373.10496.19.camel@eggis1>
2002-12-14 6:09 ` Michal Ludvig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox