* The 'cold' function attribute and GDB
@ 2019-05-01 18:59 Eli Zaretskii
2019-05-01 20:17 ` Simon Marchi
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-01 18:59 UTC (permalink / raw)
To: gdb-patches
If some functions in your program are declared with
__attribute__((cold)), GCC generates local symbols of the form
FOO.cold, where FOO is a function that has a code path which
eventually calls one of the 'cold' functions. When GDB needs to show
a backtrace or a stack frame inside one of these code paths, it shows
FOO.cold as the name of the function. Here's a real-life example:
Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
0x76a63227 in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
(gdb) thread 1
[Switching to thread 1 (Thread 7552.0x172c)]
#0 0x76a63227 in KERNELBASE!DebugBreak ()
from C:\Windows\syswow64\KernelBase.dll
(gdb) up
#1 0x012e7b89 in emacs_abort () at w32fns.c:10768
10768 DebugBreak ();
(gdb)
#2 0x012e1f3b in print_vectorlike.cold () at print.c:1824 <<<<<<<<<<<<<<<<
1824 emacs_abort ();
(gdb) bt
#0 0x76a63227 in KERNELBASE!DebugBreak ()
from C:\Windows\syswow64\KernelBase.dll
#1 0x012e7b89 in emacs_abort () at w32fns.c:10768
#2 0x012e1f3b in print_vectorlike.cold () at print.c:1824 <<<<<<<<<<<<<<<<
#3 0x011d2dec in print_object (obj=<optimized out>, printcharfun=XIL(0),
escapeflag=true) at print.c:2150
There's a print_vectorlike function in Emacs, and it calls
emacs_abort, which is declared 'cold' (because it is called in case of
fatal errors and doesn't return).
Since the 'cold' functions tend to be those that are called in case of
some calamity, and since GDB is likely to be used to debug such
calamities, and the user is likely to put breakpoint in such 'cold'
functions precisely to catch these calamities, the above is not such a
rare use case as far as real-life use of GDB goes.
I think this display is confusing, but is there any reasonable way for
GDB not to expose these symbols to the user?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-01 18:59 The 'cold' function attribute and GDB Eli Zaretskii
@ 2019-05-01 20:17 ` Simon Marchi
2019-05-02 2:51 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Simon Marchi @ 2019-05-01 20:17 UTC (permalink / raw)
To: Eli Zaretskii, gdb-patches
On 2019-05-01 2:58 p.m., Eli Zaretskii wrote:
> If some functions in your program are declared with
> __attribute__((cold)), GCC generates local symbols of the form
> FOO.cold, where FOO is a function that has a code path which
> eventually calls one of the 'cold' functions. When GDB needs to show
> a backtrace or a stack frame inside one of these code paths, it shows
> FOO.cold as the name of the function. Here's a real-life example:
>
> Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
> 0x76a63227 in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
> (gdb) thread 1
> [Switching to thread 1 (Thread 7552.0x172c)]
> #0 0x76a63227 in KERNELBASE!DebugBreak ()
> from C:\Windows\syswow64\KernelBase.dll
> (gdb) up
> #1 0x012e7b89 in emacs_abort () at w32fns.c:10768
> 10768 DebugBreak ();
> (gdb)
> #2 0x012e1f3b in print_vectorlike.cold () at print.c:1824 <<<<<<<<<<<<<<<<
> 1824 emacs_abort ();
> (gdb) bt
> #0 0x76a63227 in KERNELBASE!DebugBreak ()
> from C:\Windows\syswow64\KernelBase.dll
> #1 0x012e7b89 in emacs_abort () at w32fns.c:10768
> #2 0x012e1f3b in print_vectorlike.cold () at print.c:1824 <<<<<<<<<<<<<<<<
> #3 0x011d2dec in print_object (obj=<optimized out>, printcharfun=XIL(0),
> escapeflag=true) at print.c:2150
The way this is written:
print_vectorlike.cold ()
makes me thing that GDB didn't find a full symbol (DWARF symbol) for this address, so
fell back on the nearest preceding minimal symbol (ELF symbol). I say this because it
lacks the arguments here, but it does accept some parameter in the code. If GDB had
found the full symbol, it should show the arguments, and the function name, print_vectorlike.
So I would inspect the DWARF info generated by gcc for the print_vectorlike function to
see what GDB is working with, it will be easier to know where to search.
In particular, how does the DWARF info describe the address range or ranges taken by
the print_vectorlike function? In an ideal world, if gcc generates two separate ranges
of instruction for this function (the hot path and the cold path), it should describe
it as two ranges in the entry for print_vectorlike. If so, it should be described with
a DW_AT_ranges attribute.
If you are not sure what to look for, try to find the debug info entry for print_vectorlike
and paste it here.
Also, which commit of GDB is this with? I recall some patches related to non-contiguous
address ranges not too long ago.
> There's a print_vectorlike function in Emacs, and it calls
> emacs_abort, which is declared 'cold' (because it is called in case of
> fatal errors and doesn't return).
>
> Since the 'cold' functions tend to be those that are called in case of
> some calamity, and since GDB is likely to be used to debug such
> calamities, and the user is likely to put breakpoint in such 'cold'
> functions precisely to catch these calamities, the above is not such a
> rare use case as far as real-life use of GDB goes.
>
> I think this display is confusing, but is there any reasonable way for
> GDB not to expose these symbols to the user?
I guess that it would be possible to handle specially minimal symbols named something.cold,
but it should not be necessary if we have good debug info and use it correctly, so I would
prefer not to go that route.
Simon
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-01 20:17 ` Simon Marchi
@ 2019-05-02 2:51 ` Kevin Buettner
2019-05-02 6:59 ` Eli Zaretskii
2019-05-02 7:06 ` Eli Zaretskii
0 siblings, 2 replies; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 2:51 UTC (permalink / raw)
To: Simon Marchi; +Cc: gdb-patches, Eli Zaretskii
On Wed, 1 May 2019 16:17:04 -0400
Simon Marchi <simark@simark.ca> wrote:
> In an ideal world, if gcc generates two separate ranges
> of instruction for this function (the hot path and the cold path), it should describe
> it as two ranges in the entry for print_vectorlike. If so, it should be described with
> a DW_AT_ranges attribute.
Yes, this is exactly right.
[...]
> Also, which commit of GDB is this with? I recall some patches related to non-contiguous
> address ranges not too long ago.
As I recall, I committed the non-contiguous address range stuff in
late August of 2018. It might be the case that 8.3 will be the first
release to include that support. Any build based on a commit to
master since 2018-08-24 should have this support.
I think I need to write some small programs which use the 'cold'
attribute to see what the DWARF looks like. I was unaware of this
attribute prior to reading Eli's message on this matter. If it can
be avoided, I (too) would prefer to not add a special case for handling
cold minimal symbols.
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 2:51 ` Kevin Buettner
@ 2019-05-02 6:59 ` Eli Zaretskii
2019-05-02 7:26 ` Kevin Buettner
2019-05-02 7:06 ` Eli Zaretskii
1 sibling, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 6:59 UTC (permalink / raw)
To: gdb-patches, Kevin Buettner, Simon Marchi
On May 2, 2019 5:51:13 AM GMT+03:00, Kevin Buettner <kevinb@redhat.com> wrote:
> On Wed, 1 May 2019 16:17:04 -0400
> Simon Marchi <simark@simark.ca> wrote:
> > Also, which commit of GDB is this with? I recall some patches
> related to non-contiguous
> > address ranges not too long ago.
>
> As I recall, I committed the non-contiguous address range stuff in
> late August of 2018. It might be the case that 8.3 will be the first
> release to include that support. Any build based on a commit to
> master since 2018-08-24 should have this support.
The non-contiguous address ranges support seems to target specifically the disassembly command? This does display 2 separate address ranges, both in GDB 8.2 and in the latest pretest of 8.3. But were those changes supposed to teach backtrace and stack frame reporting in general about multiple ranges? Because in that case GDB needs to solve the inverse problem: go from the address to the symbol.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 2:51 ` Kevin Buettner
2019-05-02 6:59 ` Eli Zaretskii
@ 2019-05-02 7:06 ` Eli Zaretskii
2019-05-02 7:38 ` Kevin Buettner
2019-05-02 15:17 ` Eli Zaretskii
1 sibling, 2 replies; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 7:06 UTC (permalink / raw)
To: gdb-patches, Kevin Buettner, Simon Marchi
On May 2, 2019 5:51:13 AM GMT+03:00, Kevin Buettner <kevinb@redhat.com> wrote:
> On Wed, 1 May 2019 16:17:04 -0400
> Simon Marchi <simark@simark.ca> wrote:
>
> > In an ideal world, if gcc generates two separate ranges
> > of instruction for this function (the hot path and the cold path),
> it should describe
> > it as two ranges in the entry for print_vectorlike. If so, it
> should be described with
> > a DW_AT_ranges attribute.
>
> Yes, this is exactly right.
I do see in the DWARF info that print_vectorlike has 2 address ranges: its DW_AT_ranges attribute specifies a value which is shown as a list of 2 ranges in the "objdump -WR" output.
So where do I go from here? What to check next?
Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 6:59 ` Eli Zaretskii
@ 2019-05-02 7:26 ` Kevin Buettner
2019-05-02 15:27 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 7:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, Simon Marchi
On Thu, 02 May 2019 09:58:47 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> On May 2, 2019 5:51:13 AM GMT+03:00, Kevin Buettner <kevinb@redhat.com> wrote:
> > On Wed, 1 May 2019 16:17:04 -0400
> > Simon Marchi <simark@simark.ca> wrote:
>
> > > Also, which commit of GDB is this with? I recall some patches
> > related to non-contiguous
> > > address ranges not too long ago.
> >
> > As I recall, I committed the non-contiguous address range stuff in
> > late August of 2018. It might be the case that 8.3 will be the first
> > release to include that support. Any build based on a commit to
> > master since 2018-08-24 should have this support.
>
> The non-contiguous address ranges support seems to target
> specifically the disassembly command? This does display 2 separate
> address ranges, both in GDB 8.2 and in the latest pretest of 8.3.
> But were those changes supposed to teach backtrace and stack frame
> reporting in general about multiple ranges? Because in that case
> GDB needs to solve the inverse problem: go from the address to the
> symbol.
There were four distinct problems that my non-contiguous address range
patch set addressed. As you say, one of them was disassembly of
functions with discontiguous ranges. But that work also addressed
problems with stepping, with breakpoint placement, and with display of
addresses. A brief discussion of these matters can be found here:
https://www.sourceware.org/ml/gdb-patches/2018-08/msg00541.html
It's possible that more needs to be done with display of addresses.
I'll need to look at it more closely to know for sure. It would help
me if there were a relatively small test case to look at. (It would
help even more if it could be reproduced on a GNU/Linux system.)
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 7:06 ` Eli Zaretskii
@ 2019-05-02 7:38 ` Kevin Buettner
2019-05-02 15:23 ` Eli Zaretskii
2019-05-02 15:17 ` Eli Zaretskii
1 sibling, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 7:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, Simon Marchi
On Thu, 02 May 2019 10:05:41 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> On May 2, 2019 5:51:13 AM GMT+03:00, Kevin Buettner <kevinb@redhat.com> wrote:
> > On Wed, 1 May 2019 16:17:04 -0400
> > Simon Marchi <simark@simark.ca> wrote:
> >
> > > In an ideal world, if gcc generates two separate ranges
> > > of instruction for this function (the hot path and the cold path),
> > it should describe
> > > it as two ranges in the entry for print_vectorlike. If so, it
> > should be described with
> > > a DW_AT_ranges attribute.
> >
> > Yes, this is exactly right.
>
>
> I do see in the DWARF info that print_vectorlike has 2 address ranges: its DW_AT_ranges attribute specifies a value which is shown as a list of 2 ranges in the "objdump -WR" output.
>
> So where do I go from here? What to check next?
Can you show me what is printed for the following commands?
disassemble print_vectorlike
x/3i print_vectorlike
b print_vectorlike
For the disassemble command, I expect both ranges to be displayed.
The x and b commands should print insns / set a breakpoint on the entry
pc (which should NOT be the cold address) of the function. If one or more
of these things aren't happening, then something is going wrong
somewhere.
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 7:06 ` Eli Zaretskii
2019-05-02 7:38 ` Kevin Buettner
@ 2019-05-02 15:17 ` Eli Zaretskii
1 sibling, 0 replies; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 15:17 UTC (permalink / raw)
To: kevinb, simark; +Cc: gdb-patches
> Date: Thu, 02 May 2019 10:05:41 +0300
> From: Eli Zaretskii <eliz@gnu.org>
>
> I do see in the DWARF info that print_vectorlike has 2 address ranges: its DW_AT_ranges attribute specifies a value which is shown as a list of 2 ranges in the "objdump -WR" output.
Specifically, "objdump -Wi" shows this:
<1><798e3d>: Abbrev Number: 161 (DW_TAG_subprogram)
<798e3f> DW_AT_name : print_vectorlike
<798e50> DW_AT_decl_file : 3
<798e51> DW_AT_decl_line : 1365
<798e53> DW_AT_decl_column : 1
<798e54> DW_AT_prototyped : 1
<798e54> DW_AT_type : <0x77ca19>
<798e58> DW_AT_ranges : 0x18b448 <<<<<<<<<<<<<<<<<<<<
and "objdump -WR" has this:
0018b448 011d6521 011d81d6
0018b448 012e1ce5 012e1f3b
0018b448 <End of list>
and the second range of addresses is for print_vectorlike.cold.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 7:38 ` Kevin Buettner
@ 2019-05-02 15:23 ` Eli Zaretskii
2019-05-02 15:56 ` Kevin Buettner
2019-05-02 18:25 ` Kevin Buettner
0 siblings, 2 replies; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 15:23 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 00:38:49 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, Simon Marchi <simark@simark.ca>
>
> Can you show me what is printed for the following commands?
>
> disassemble print_vectorlike
> x/3i print_vectorlike
> b print_vectorlike
>
> For the disassemble command, I expect both ranges to be displayed.
> The x and b commands should print insns / set a breakpoint on the entry
> pc (which should NOT be the cold address) of the function. If one or more
> of these things aren't happening, then something is going wrong
> somewhere.
Here:
(gdb) disassemble /m print_vectorlike
Dump of assembler code for function print_vectorlike:
Address range 0x11d6521 to 0x11d81d6:
1367 {
0x011d6521 <+0>: push %ebp
0x011d6522 <+1>: mov %esp,%ebp
0x011d6524 <+3>: push %edi
0x011d6525 <+4>: push %esi
0x011d6526 <+5>: push %ebx
0x011d6527 <+6>: sub $0x7c,%esp
[...]
Address range 0x12e1ce5 to 0x12e1f3b:
1824 emacs_abort ();
0x012e1f36 <+593>: call 0x12e7b40 <emacs_abort>
End of assembler dump.
(gdb) x/3i print_vectorlike
0x11d6521 <print_vectorlike>: push %ebp
0x11d6522 <print_vectorlike+1>: mov %esp,%ebp
0x11d6524 <print_vectorlike+3>: push %edi
(gdb) break print_vectorlike
Breakpoint 1 at 0x11d6521: file print.c, line 1367.
Seems to be according to your expectations.
Btw, if I use a source line number whose code is in the 'cold' area,
then I see this:
(gdb) break print.c:1824
Breakpoint 1 at 0x12e1f36: file print.c, line 1824.
(gdb) info line print.c:1824
Line 1824 of "print.c"
starts at address 0x12e1f36 <print_vectorlike.cold.65+593>
and ends at 0x12e1f3b <print_vectorlike.cold.65+598>.
Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 7:26 ` Kevin Buettner
@ 2019-05-02 15:27 ` Eli Zaretskii
2019-05-02 15:59 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 15:27 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 00:26:44 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, Simon Marchi <simark@simark.ca>
>
> https://www.sourceware.org/ml/gdb-patches/2018-08/msg00541.html
>
> It's possible that more needs to be done with display of addresses.
> I'll need to look at it more closely to know for sure. It would help
> me if there were a relatively small test case to look at. (It would
> help even more if it could be reproduced on a GNU/Linux system.)
I tried to write a small test program, but couldn't reproduce the
issue with the backtrace.
It's possible that the shortest way of reproducing this is for you to
build the latest master branch of Emacs with the default compiler
switches (which should yield -O2 -g3). If you go that way, let me
know if you need help in actually forcing Emacs to fall through on the
call to emacs_abort. I'd also be interested to know whether this is
reproducible on GNU/Linux.
This is with GCC 8.2.0, btw.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 15:23 ` Eli Zaretskii
@ 2019-05-02 15:56 ` Kevin Buettner
2019-05-02 16:43 ` Eli Zaretskii
2019-05-02 18:25 ` Kevin Buettner
1 sibling, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 15:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, simark
On Thu, 02 May 2019 18:23:20 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Thu, 2 May 2019 00:38:49 -0700
> > From: Kevin Buettner <kevinb@redhat.com>
> > Cc: gdb-patches@sourceware.org, Simon Marchi <simark@simark.ca>
> >
> > Can you show me what is printed for the following commands?
> >
> > disassemble print_vectorlike
> > x/3i print_vectorlike
> > b print_vectorlike
> >
> > For the disassemble command, I expect both ranges to be displayed.
> > The x and b commands should print insns / set a breakpoint on the entry
> > pc (which should NOT be the cold address) of the function. If one or more
> > of these things aren't happening, then something is going wrong
> > somewhere.
>
> Here:
>
> (gdb) disassemble /m print_vectorlike
> Dump of assembler code for function print_vectorlike:
> Address range 0x11d6521 to 0x11d81d6:
> 1367 {
> 0x011d6521 <+0>: push %ebp
> 0x011d6522 <+1>: mov %esp,%ebp
> 0x011d6524 <+3>: push %edi
> 0x011d6525 <+4>: push %esi
> 0x011d6526 <+5>: push %ebx
> 0x011d6527 <+6>: sub $0x7c,%esp
> [...]
> Address range 0x12e1ce5 to 0x12e1f3b:
> 1824 emacs_abort ();
> 0x012e1f36 <+593>: call 0x12e7b40 <emacs_abort>
>
> End of assembler dump.
>
> (gdb) x/3i print_vectorlike
> 0x11d6521 <print_vectorlike>: push %ebp
> 0x11d6522 <print_vectorlike+1>: mov %esp,%ebp
> 0x11d6524 <print_vectorlike+3>: push %edi
> (gdb) break print_vectorlike
> Breakpoint 1 at 0x11d6521: file print.c, line 1367.
>
> Seems to be according to your expectations.
Yes, this all looks good to me.
> Btw, if I use a source line number whose code is in the 'cold' area,
> then I see this:
>
> (gdb) break print.c:1824
> Breakpoint 1 at 0x12e1f36: file print.c, line 1824.
> (gdb) info line print.c:1824
> Line 1824 of "print.c"
> starts at address 0x12e1f36 <print_vectorlike.cold.65+593>
> and ends at 0x12e1f3b <print_vectorlike.cold.65+598>.
I'm not especially surprised that GDB chose a nearby minimal symbol to
print here. I am surprised at the rather large offsets from that
symbol, however. In the example that I studied last year, I think
that the first offset was 0.
I'll try building emacs...
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 15:27 ` Eli Zaretskii
@ 2019-05-02 15:59 ` Kevin Buettner
2019-05-02 16:46 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 15:59 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, simark
On Thu, 02 May 2019 18:26:49 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Thu, 2 May 2019 00:26:44 -0700
> > From: Kevin Buettner <kevinb@redhat.com>
> > Cc: gdb-patches@sourceware.org, Simon Marchi <simark@simark.ca>
> >
> > https://www.sourceware.org/ml/gdb-patches/2018-08/msg00541.html
> >
> > It's possible that more needs to be done with display of addresses.
> > I'll need to look at it more closely to know for sure. It would help
> > me if there were a relatively small test case to look at. (It would
> > help even more if it could be reproduced on a GNU/Linux system.)
>
> I tried to write a small test program, but couldn't reproduce the
> issue with the backtrace.
>
> It's possible that the shortest way of reproducing this is for you to
> build the latest master branch of Emacs with the default compiler
> switches (which should yield -O2 -g3). If you go that way, let me
> know if you need help in actually forcing Emacs to fall through on the
> call to emacs_abort. I'd also be interested to know whether this is
> reproducible on GNU/Linux.
>
> This is with GCC 8.2.0, btw.
I'll give this a try.
I have GCC 8.3.1 and 9.0.1 readily available. Do you think it's
important to use 8.2.0 ?
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 15:56 ` Kevin Buettner
@ 2019-05-02 16:43 ` Eli Zaretskii
0 siblings, 0 replies; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 16:43 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 08:56:23 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, simark@simark.ca
>
> > (gdb) break print.c:1824
> > Breakpoint 1 at 0x12e1f36: file print.c, line 1824.
> > (gdb) info line print.c:1824
> > Line 1824 of "print.c"
> > starts at address 0x12e1f36 <print_vectorlike.cold.65+593>
> > and ends at 0x12e1f3b <print_vectorlike.cold.65+598>.
>
> I'm not especially surprised that GDB chose a nearby minimal symbol to
> print here. I am surprised at the rather large offsets from that
> symbol, however. In the example that I studied last year, I think
> that the first offset was 0.
In the full disassembly of print_vectorlike, there are many jumps to
print_vectorlike.cold.65, with different offsets. The smallest offset
is indeed zero. So it looks like print_vectorlike.cold.65 is a heap
of all the "cold paths" in print_vectorlike, all of which eventually
lead to emacs_abort. The direct call to that is just one such path.
Let me know if you need help with building the master branch of Emacs.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 15:59 ` Kevin Buettner
@ 2019-05-02 16:46 ` Eli Zaretskii
2019-05-02 18:08 ` Simon Marchi
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 16:46 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 08:59:19 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, simark@simark.ca
>
> > This is with GCC 8.2.0, btw.
>
> I'll give this a try.
>
> I have GCC 8.3.1 and 9.0.1 readily available. Do you think it's
> important to use 8.2.0 ?
No, I don't think so, not up front. If you will get different
results, we might consider this, but I won't bother with downgrading
at this time.
Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 16:46 ` Eli Zaretskii
@ 2019-05-02 18:08 ` Simon Marchi
2019-05-02 18:47 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Simon Marchi @ 2019-05-02 18:08 UTC (permalink / raw)
To: Eli Zaretskii, Kevin Buettner; +Cc: gdb-patches
On 2019-05-02 12:45 p.m., Eli Zaretskii wrote:
>> Date: Thu, 2 May 2019 08:59:19 -0700
>> From: Kevin Buettner <kevinb@redhat.com>
>> Cc: gdb-patches@sourceware.org, simark@simark.ca
>>
>>> This is with GCC 8.2.0, btw.
>>
>> I'll give this a try.
>>
>> I have GCC 8.3.1 and 9.0.1 readily available. Do you think it's
>> important to use 8.2.0 ?
I just built gcc from git, and built emacs with it, and I get pretty much the
same ranges as you:
<1><66a91b>: Abbrev Number: 141 (DW_TAG_subprogram)
<66a91d> DW_AT_name : (indirect string, offset: 0xf3fe3): print_vectorlike
<66a921> DW_AT_decl_file : 1
<66a922> DW_AT_decl_line : 1365
<66a924> DW_AT_decl_column : 1
<66a925> DW_AT_prototyped : 1
<66a925> DW_AT_type : <0x65a0ff>
<66a929> DW_AT_ranges : 0x234cf0
<66a92d> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<66a92f> DW_AT_GNU_all_call_sites: 1
<66a92f> DW_AT_sibling : <0x66cf15>
And the ranges:
00234cf0 000000000057da80 000000000057e88f
00234cf0 00000000004173b1 00000000004173b6
00234cf0 <End of list>
In my case, the second range, the cold one, is one instruction long, just the call to emacs_abort:
00000000004173b1 <print_vectorlike.cold>:
4173b1: e8 77 ce ff ff callq 41422d <emacs_abort>
Can you give the steps to reproduce the bug that leads us there?
In the mean time, I tried to do "start", followed by "set $pc = 0x4173b1", and this is what I get as my
first frame:
#0 0x00000000004173b1 in print_vectorlike (obj=0x1, printcharfun=0x7fffffffdd18, escapeflag=<optimized out>, buf=0xa0 <error: Cannot access memory at address 0xa0>) at /home/smarchi/src/emacs/src/print.c:1824
The parameter values are bogus, and the rest of the frames are corrupted, because I don't have
the stack I would normally have when executing this code. But we can see that the full symbol
was found: the arguments are printed, and the function name is correct (doesn't include .cold).
So it looks like some debugging of this problem on Windows will be needed :(
Simon
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 15:23 ` Eli Zaretskii
2019-05-02 15:56 ` Kevin Buettner
@ 2019-05-02 18:25 ` Kevin Buettner
2019-05-02 18:52 ` Eli Zaretskii
1 sibling, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 18:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, simark
On Thu, 02 May 2019 18:23:20 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Thu, 2 May 2019 00:38:49 -0700
> > From: Kevin Buettner <kevinb@redhat.com>
> > Cc: gdb-patches@sourceware.org, Simon Marchi <simark@simark.ca>
> >
> > Can you show me what is printed for the following commands?
> >
> > disassemble print_vectorlike
> > x/3i print_vectorlike
> > b print_vectorlike
> >
> > For the disassemble command, I expect both ranges to be displayed.
> > The x and b commands should print insns / set a breakpoint on the entry
> > pc (which should NOT be the cold address) of the function. If one or more
> > of these things aren't happening, then something is going wrong
> > somewhere.
>
> Here:
>
> (gdb) disassemble /m print_vectorlike
> Dump of assembler code for function print_vectorlike:
> Address range 0x11d6521 to 0x11d81d6:
> 1367 {
> 0x011d6521 <+0>: push %ebp
> 0x011d6522 <+1>: mov %esp,%ebp
> 0x011d6524 <+3>: push %edi
> 0x011d6525 <+4>: push %esi
> 0x011d6526 <+5>: push %ebx
> 0x011d6527 <+6>: sub $0x7c,%esp
> [...]
> Address range 0x12e1ce5 to 0x12e1f3b:
> 1824 emacs_abort ();
> 0x012e1f36 <+593>: call 0x12e7b40 <emacs_abort>
>
> End of assembler dump.
>
> (gdb) x/3i print_vectorlike
> 0x11d6521 <print_vectorlike>: push %ebp
> 0x11d6522 <print_vectorlike+1>: mov %esp,%ebp
> 0x11d6524 <print_vectorlike+3>: push %edi
> (gdb) break print_vectorlike
> Breakpoint 1 at 0x11d6521: file print.c, line 1367.
>
> Seems to be according to your expectations.
>
> Btw, if I use a source line number whose code is in the 'cold' area,
> then I see this:
>
> (gdb) break print.c:1824
> Breakpoint 1 at 0x12e1f36: file print.c, line 1824.
> (gdb) info line print.c:1824
> Line 1824 of "print.c"
> starts at address 0x12e1f36 <print_vectorlike.cold.65+593>
> and ends at 0x12e1f3b <print_vectorlike.cold.65+598>.
I have built emacs on Fedora 29 (GCC 8.3.1) and Fedora 30 (GCC 9.0.1).
So far, the output of the commands that I've tried has been largely
the same between these versions. So, in the output below, I'll only
show the Fedora 30 / GCC 9.0.1 output.
This is what I see for disassemble. There are indeed two address
ranges. It looks reasonable to me.
(gdb) disassemble print_vectorlike
Dump of assembler code for function print_vectorlike:
Address range 0x58cab0 to 0x58d88b:
0x000000000058cab0 <+0>: movabs $0x4000000000000000,%rax
0x000000000058caba <+10>: push %rbp
0x000000000058cabb <+11>: mov %rsp,%rbp
0x000000000058cabe <+14>: push %r15
0x000000000058cac0 <+16>: mov %rdi,%r15
0x000000000058cac3 <+19>: push %r14
0x000000000058cac5 <+21>: mov %rcx,%r14
0x000000000058cac8 <+24>: push %r13
0x000000000058caca <+26>: mov %edx,%r13d
0x000000000058cacd <+29>: push %r12
0x000000000058cacf <+31>: mov %rsi,%r12
0x000000000058cad2 <+34>: push %rbx
0x000000000058cad3 <+35>: sub $0x28,%rsp
0x000000000058cad7 <+39>: mov -0x5(%rdi),%rbx
0x000000000058cadb <+43>: and %rbx,%rax
[...]
0x000000000058d452 <+3522>: callq 0x5878d0 <printchar>
0x000000000058d457 <+3527>: jmpq 0x58c736 <print_vectorlike+166>
Address range 0x41f3e9 to 0x41f3ee:
0x000000000041f3e9 <+-1495719>: callq 0x41c744 <emacs_abort>
End of assembler dump.
I also tried disassemble /m. It produces a lot of output, so much
output in fact, that I found it to be extremely unhelpful. I think
it's related to the fact that it's picking up inline expansions from
lisp.h. However, in addition to printing too much source code, it
also seemed to print too much disassembly. I think that once it got
started disassembling code from lisp.h, it continued to the end of
the file.
Here's the x/3i command:
(gdb) x/3i print_vectorlike
0x58cab0 <print_vectorlike>: movabs $0x4000000000000000,%rax
0x58caba <print_vectorlike+10>: push %rbp
0x58cabb <print_vectorlike+11>: mov %rsp,%rbp
This looks fine to me.
Here's the breakpoint command:
(gdb) b print_vectorlike
Breakpoint 3 at 0x58cab0: file /ironwood1/emacs-git/f30/bld/../../emacs/src/lisp.h, line 1666.
This looks odd to me too. Most users, myself included, would expect to see
the breakpoint set in print.c at line 1367 or 1368.
Let's see what "info line" says about the start address:
(gdb) info line *0x58cab0
Line 1666 of "/ironwood1/emacs-git/f30/bld/../../emacs/src/lisp.h"
starts at address 0x58cab0 <print_vectorlike>
and ends at 0x58cadb <print_vectorlike+43>.
It appears to me that the compiler has moved one of the instructions
for PSEUDOVECTOR_TYPE, an inline function defined in lisp.h, ahead of
the prologue instructions for print_vectorlike.
.debug_line contains information about both lines and addresses for both
files; the file name table for this hunk shows print.c at entry 1 and
lisp.h at entry 2:
[0x002a97eb] Set File Name to entry 1 in the File Name Table
[...]
[0x002a980c] Set is_stmt to 1
[0x002a980d] Advance Line by -821 to 1367
[0x002a9810] Special opcode 117: advance Address by 8 to 0x58cab0 and Line by 0 to 1367
[0x002a9811] Set column to 3
[0x002a9813] Special opcode 6: advance Address by 0 to 0x58cab0 and Line by 1 to 1368 (view 1)
[0x002a9814] Set File Name to entry 2 in the File Name Table
[0x002a9816] Advance Line by 262 to 1630
[0x002a9819] Copy (view 2)
[0x002a981a] Special opcode 6: advance Address by 0 to 0x58cab0 and Line by 1 to 1631 (view 3)
[0x002a981b] Advance Line by -890 to 741
[0x002a981e] Copy (view 4)
[0x002a981f] Set column to 1
[0x002a9821] Advance Line by 923 to 1664
[0x002a9824] Copy (view 5)
[0x002a9825] Set column to 3
[0x002a9827] Special opcode 7: advance Address by 0 to 0x58cab0 and Line by 2 to 1666 (view 6)
[0x002a9828] Set column to 16
[0x002a982a] Set is_stmt to 0
[0x002a982b] Special opcode 6: advance Address by 0 to 0x58cab0 and Line by 1 to 1667 (view 7)
[0x002a982c] Set File Name to entry 1 in the File Name Table
[0x002a982e] Set column to 1
[0x002a9830] Advance Line by -300 to 1367
[0x002a9833] Special opcode 145: advance Address by 10 to 0x58caba and Line by 0 to 1367
[0x002a9834] Set File Name to entry 2 in the File Name Table
I haven't yet checked to see what GDB is doing with this info. However, if
the addresses/lines associated with print.c are still available, it makes
(more) sense to use them here in this context.
I also did some operations on the address in the second range:
(gdb) x/i 0x000000000041fed1
0x41fed1 <print_vectorlike+4293473313>: callq 0x41cca5 <emacs_abort>
(gdb) b *0x41fed1
Breakpoint 6 at 0x41fed1: file /ironwood1/emacs-git/f30/bld/../../emacs/src/print.c, line 1824.
(gdb) info line *0x41fed1
Line 1824 of "/ironwood1/emacs-git/f30/bld/../../emacs/src/print.c"
starts at address 0x41fed1 <print_vectorlike+4293473313>
and ends at 0x5873b0 <print_unwind>.
These look okay to me too. On Linux, I think we're lacking the minimal
symbol which Eli is seeing on Windows.
(During my first try at this, I thought I was seeing an incorrect result
when setting the breakpoint. But I don't have enough scrollback to be
able to reexamine what happened. I'll play with it a bit more because
what I saw definitely felt like a bug, though not the problem that Eli is
seeing.)
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 18:08 ` Simon Marchi
@ 2019-05-02 18:47 ` Eli Zaretskii
2019-05-02 18:55 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 18:47 UTC (permalink / raw)
To: Simon Marchi; +Cc: kevinb, gdb-patches
> Cc: gdb-patches@sourceware.org
> From: Simon Marchi <simark@simark.ca>
> Date: Thu, 2 May 2019 14:08:02 -0400
>
> In my case, the second range, the cold one, is one instruction long, just the call to emacs_abort:
>
> 00000000004173b1 <print_vectorlike.cold>:
> 4173b1: e8 77 ce ff ff callq 41422d <emacs_abort>
Probably because yours is a 64-bit build. Mine is a 32-bit one.
> Can you give the steps to reproduce the bug that leads us there?
It's a bit tricky (my reproduction recipe is based on a bug, but it
only rears its head on Windows, we will need to simulate on your
system).
In a nutshell, you need to put a breakpoint in this routine, run Emacs
as usual, and when it breaks, make the PSEUDOVECTOR_TYPE macro to
return something that doesn't match any of the cases in the switch.
For example, make it return PVEC_FREE (whose numerical value is 1). I
think the easiest will be to use nexti through the code that invokes
PSEUDOVECTOR_TYPE, then assign 1 to the register that gets the result.
You should see the code then fall through to emacs_abort.
To force Emacs to call this function, type this after starting Emacs:
C-h v char-script-table RET
(I assume you are familiar with the notation like C-h and RET.)
> In the mean time, I tried to do "start", followed by "set $pc = 0x4173b1", and this is what I get as my
> first frame:
>
> #0 0x00000000004173b1 in print_vectorlike (obj=0x1, printcharfun=0x7fffffffdd18, escapeflag=<optimized out>, buf=0xa0 <error: Cannot access memory at address 0xa0>) at /home/smarchi/src/emacs/src/print.c:1824
>
> The parameter values are bogus, and the rest of the frames are corrupted, because I don't have
> the stack I would normally have when executing this code. But we can see that the full symbol
> was found: the arguments are printed, and the function name is correct (doesn't include .cold).
>
> So it looks like some debugging of this problem on Windows will be needed :(
Why am I not surprised?
So some description of how GDB finds its way back from $pc to
print_vectorlike using the DW ranges, with pointers to relevant code,
will be appreciated.
Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 18:25 ` Kevin Buettner
@ 2019-05-02 18:52 ` Eli Zaretskii
2019-05-02 19:13 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 18:52 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 11:25:17 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, simark@simark.ca
>
> (gdb) x/i 0x000000000041fed1
> 0x41fed1 <print_vectorlike+4293473313>: callq 0x41cca5 <emacs_abort>
> (gdb) b *0x41fed1
> Breakpoint 6 at 0x41fed1: file /ironwood1/emacs-git/f30/bld/../../emacs/src/print.c, line 1824.
> (gdb) info line *0x41fed1
> Line 1824 of "/ironwood1/emacs-git/f30/bld/../../emacs/src/print.c"
> starts at address 0x41fed1 <print_vectorlike+4293473313>
> and ends at 0x5873b0 <print_unwind>.
>
> These look okay to me too. On Linux, I think we're lacking the minimal
> symbol which Eli is seeing on Windows.
Don't you see print_vectorlike.cold in the debug info, with objdump or
elfread? If not, perhaps ELF binaries record cold sections in some
different way from PE-COFF.
I also see a lot of *.cold symbols with "nm -A" on the Emacs binary.
Do you?
Thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 18:47 ` Eli Zaretskii
@ 2019-05-02 18:55 ` Kevin Buettner
2019-05-02 19:32 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 18:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, Simon Marchi
Hi Eli,
Can you provide us (maybe just me and Simon) with the 32-bit windows
emacs executable?
I think it may be possible to make some headway on this matter without
needing to run emacs. I think the presence of the .cold minimal
symbol is key here.
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 18:52 ` Eli Zaretskii
@ 2019-05-02 19:13 ` Kevin Buettner
2019-05-02 19:28 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 19:13 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, simark
On Thu, 02 May 2019 21:51:26 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Thu, 2 May 2019 11:25:17 -0700
> > From: Kevin Buettner <kevinb@redhat.com>
> > Cc: gdb-patches@sourceware.org, simark@simark.ca
> >
> > (gdb) x/i 0x000000000041fed1
> > 0x41fed1 <print_vectorlike+4293473313>: callq 0x41cca5 <emacs_abort>
> > (gdb) b *0x41fed1
> > Breakpoint 6 at 0x41fed1: file /ironwood1/emacs-git/f30/bld/../../emacs/src/print.c, line 1824.
> > (gdb) info line *0x41fed1
> > Line 1824 of "/ironwood1/emacs-git/f30/bld/../../emacs/src/print.c"
> > starts at address 0x41fed1 <print_vectorlike+4293473313>
> > and ends at 0x5873b0 <print_unwind>.
> >
> > These look okay to me too. On Linux, I think we're lacking the minimal
> > symbol which Eli is seeing on Windows.
>
> Don't you see print_vectorlike.cold in the debug info, with objdump or
> elfread? If not, perhaps ELF binaries record cold sections in some
> different way from PE-COFF.
>
> I also see a lot of *.cold symbols with "nm -A" on the Emacs binary.
> Do you?
I hadn't checked before, but I do see .cold symbols...
emacs:000000000058cab0 t print_vectorlike
emacs:000000000041fed1 t print_vectorlike.cold
The address for print_vectorlike.cold is exactly the address for the
second address range for print_vectorlike:
Address range 0x41fed1 to 0x41fed6:
0x000000000041fed1 <+-1493983>: callq 0x41cca5 <emacs_abort>
It's interesting that this _doesn't_ show up when doing the following:
(gdb) x/i 0x000000000041fed1
0x41fed1 <print_vectorlike+4293473313>: callq 0x41cca5
<emacs_abort>
What do you see when you issue a similar command for your build
of emacs on Windows? (Apologies if you've already shown this to
me.)
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 19:13 ` Kevin Buettner
@ 2019-05-02 19:28 ` Eli Zaretskii
2019-05-02 19:45 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 19:28 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 12:13:05 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, simark@simark.ca
>
> I hadn't checked before, but I do see .cold symbols...
>
> emacs:000000000058cab0 t print_vectorlike
> emacs:000000000041fed1 t print_vectorlike.cold
>
> The address for print_vectorlike.cold is exactly the address for the
> second address range for print_vectorlike:
>
> Address range 0x41fed1 to 0x41fed6:
> 0x000000000041fed1 <+-1493983>: callq 0x41cca5 <emacs_abort>
>
> It's interesting that this _doesn't_ show up when doing the following:
>
> (gdb) x/i 0x000000000041fed1
> 0x41fed1 <print_vectorlike+4293473313>: callq 0x41cca5
> <emacs_abort>
>
> What do you see when you issue a similar command for your build
> of emacs on Windows? (Apologies if you've already shown this to
> me.)
"nm -A" shows this:
011d6521 t _print_vectorlike
012e1ce5 t _print_vectorlike.cold.65
And GDB shows this:
(gdb) x/i 0x012e1ce5
0x12e1ce5 <print_vectorlike.cold.65>: movl $0xf5,0x8(%esp)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 18:55 ` Kevin Buettner
@ 2019-05-02 19:32 ` Eli Zaretskii
2019-05-02 19:51 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 19:32 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 11:55:31 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, Simon Marchi <simark@simark.ca>
>
> Can you provide us (maybe just me and Simon) with the 32-bit windows
> emacs executable?
Yes, I can. Tell me where to upload, it's an 7MB file after
compressing with XZ. Should I mail it to you (off-list)?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 19:28 ` Eli Zaretskii
@ 2019-05-02 19:45 ` Kevin Buettner
2019-05-02 19:56 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 19:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, simark
On Thu, 02 May 2019 22:28:17 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Thu, 2 May 2019 12:13:05 -0700
> > From: Kevin Buettner <kevinb@redhat.com>
> > Cc: gdb-patches@sourceware.org, simark@simark.ca
> >
> > I hadn't checked before, but I do see .cold symbols...
> >
> > emacs:000000000058cab0 t print_vectorlike
> > emacs:000000000041fed1 t print_vectorlike.cold
> >
> > The address for print_vectorlike.cold is exactly the address for the
> > second address range for print_vectorlike:
> >
> > Address range 0x41fed1 to 0x41fed6:
> > 0x000000000041fed1 <+-1493983>: callq 0x41cca5 <emacs_abort>
> >
> > It's interesting that this _doesn't_ show up when doing the following:
> >
> > (gdb) x/i 0x000000000041fed1
> > 0x41fed1 <print_vectorlike+4293473313>: callq 0x41cca5
> > <emacs_abort>
> >
> > What do you see when you issue a similar command for your build
> > of emacs on Windows? (Apologies if you've already shown this to
> > me.)
>
> "nm -A" shows this:
>
> 011d6521 t _print_vectorlike
> 012e1ce5 t _print_vectorlike.cold.65
>
> And GDB shows this:
>
> (gdb) x/i 0x012e1ce5
> 0x12e1ce5 <print_vectorlike.cold.65>: movl $0xf5,0x8(%esp)
What I (think I) actually want to see is "x/i 0x012e1f36".
This is the address corresponding to the call of emacs_abort in the
second address range. What I want to see is whether
print_vectorlike.cold.65 is used to print the address.
If so, it should be possible to debug this problem without needing
to actually run it.
Looking back at your earlier email where you provided me with
the output of the disassemble command, i just noticed something
odd:
> Address range 0x12e1ce5 to 0x12e1f3b:
> 1824 emacs_abort ();
> 0x012e1f36 <+593>: call 0x12e7b40 <emacs_abort>
The address range includes more addresses than are shown. I would
have expected the range to be something like "0x012e1f36 to 0x12e1f3b"
instead of starting at 0x12e1ce5 as shown.
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 19:32 ` Eli Zaretskii
@ 2019-05-02 19:51 ` Kevin Buettner
0 siblings, 0 replies; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 19:51 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches
On Thu, 02 May 2019 22:32:01 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Thu, 2 May 2019 11:55:31 -0700
> > From: Kevin Buettner <kevinb@redhat.com>
> > Cc: gdb-patches@sourceware.org, Simon Marchi <simark@simark.ca>
> >
> > Can you provide us (maybe just me and Simon) with the 32-bit windows
> > emacs executable?
>
> Yes, I can. Tell me where to upload, it's an 7MB file after
> compressing with XZ. Should I mail it to you (off-list)?
Yes, emailing it to my redhat.com address should be fine. If it ends up
being too big, we can try something else.
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 19:45 ` Kevin Buettner
@ 2019-05-02 19:56 ` Eli Zaretskii
2019-05-02 23:30 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-02 19:56 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 12:45:09 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: gdb-patches@sourceware.org, simark@simark.ca
>
> > (gdb) x/i 0x012e1ce5
> > 0x12e1ce5 <print_vectorlike.cold.65>: movl $0xf5,0x8(%esp)
>
> What I (think I) actually want to see is "x/i 0x012e1f36".
(gdb) x/i 0x012e1f36
0x12e1f36 <print_vectorlike.cold.65+593>: call 0x12e7b40 <emacs_abort>
> This is the address corresponding to the call of emacs_abort in the
> second address range. What I want to see is whether
> print_vectorlike.cold.65 is used to print the address.
As you see, it is.
> Looking back at your earlier email where you provided me with
> the output of the disassemble command, i just noticed something
> odd:
>
> > Address range 0x12e1ce5 to 0x12e1f3b:
> > 1824 emacs_abort ();
> > 0x012e1f36 <+593>: call 0x12e7b40 <emacs_abort>
>
> The address range includes more addresses than are shown. I would
> have expected the range to be something like "0x012e1f36 to 0x12e1f3b"
> instead of starting at 0x12e1ce5 as shown.
As I described in an earlier message, there are many jumps to
print_vectorlike.cold.65 with different offsets from various places in
print_vectorlike's disassembly, the smallest offset is zero. It
sounds like the 32-bit code generation produces many more "cold"
paths, and puts them all together under the name
print_vectorlike.cold.65.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 19:56 ` Eli Zaretskii
@ 2019-05-02 23:30 ` Kevin Buettner
2019-05-04 8:30 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-02 23:30 UTC (permalink / raw)
To: gdb-patches; +Cc: Eli Zaretskii, simark
On Thu, 02 May 2019 22:56:23 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > What I (think I) actually want to see is "x/i 0x012e1f36".
>
> (gdb) x/i 0x012e1f36
> 0x12e1f36 <print_vectorlike.cold.65+593>: call 0x12e7b40 <emacs_abort>
Eli provided me with a windows executable. Using an x86_64 GDB built
with --enable-targets=all, I'm able to reproduce the behavior quoted above:
Reading symbols from bootstrap-emacs.exe...
(gdb) info target
Symbols from "/mesquite2/sourceware-git/f30-ranges/tst/bootstrap-emacs.exe".
Local exec file:
`/mesquite2/sourceware-git/f30-ranges/tst/bootstrap-emacs.exe',
file type pei-i386.
Entry point: 0x127b5c9
0x01001000 - 0x012eb984 is .text
0x012ec000 - 0x0160e240 is .data
0x0160f000 - 0x01619580 is .subrs
0x0161a000 - 0x0165a950 is .rdata
0x0165b000 - 0x016b5d1c is .eh_frame
0x016b6000 - 0x017389a0 is .bss
0x01739000 - 0x0173d574 is .idata
0x0173e000 - 0x0173e018 is .CRT
0x0173f000 - 0x0173f020 is .tls
0x01740000 - 0x0179b388 is .rsrc
(gdb) x/i 0x12e1f36
0x12e1f36 <print_vectorlike.cold.65+593>: call 0x12e7b40 <emacs_abort>
For those who haven't been following along, we might prefer to see
print_vectorlike+SomeConstant instead of the symbol
print_vectorlike.cold.65. It might not be that objectionable in this
context; I think it's more confusing in a backtrace, where it's shown
as a function which doesn't exist in the user's code. E.g. this is
the frame from Eli's first message in this thread:
#2 0x012e1f3b in print_vectorlike.cold () at print.c:1824
This behavior is, apparently, intentional. The function
build_address_symbolic in printcmd.c contains the following
statements; in particular note the second comment:
if (msymbol.minsym != NULL)
{
if (BMSYMBOL_VALUE_ADDRESS (msymbol) > name_location || symbol == NULL)
{
/* If this is a function (i.e. a code address), strip out any
non-address bits. For instance, display a pointer to the
first instruction of a Thumb function as <function>; the
second instruction will be <function+2>, even though the
pointer is <function+3>. This matches the ISA behavior. */
if (MSYMBOL_TYPE (msymbol.minsym) == mst_text
|| MSYMBOL_TYPE (msymbol.minsym) == mst_text_gnu_ifunc
|| MSYMBOL_TYPE (msymbol.minsym) == mst_file_text
|| MSYMBOL_TYPE (msymbol.minsym) == mst_solib_trampoline)
addr = gdbarch_addr_bits_remove (gdbarch, addr);
/* The msymbol is closer to the address than the symbol;
use the msymbol instead. */
symbol = 0;
name_location = BMSYMBOL_VALUE_ADDRESS (msymbol);
if (do_demangle || asm_demangle)
name_temp = MSYMBOL_PRINT_NAME (msymbol.minsym);
else
name_temp = MSYMBOL_LINKAGE_NAME (msymbol.minsym);
}
}
When I traced it, prior to getting to the outer if statement,
name_location contained the address of 'print_vectorlike' and
name_temp was set to "print_vectorlike". So, if we were to
*not* execute this code, we wouldn't use the minimal symbol.
To test this, I disabled the code above via #if 0 ... #endif. Here's
what I see when I do this:
(gdb) x/i 0x12e1f36
0x12e1f36 <print_vectorlike+1096213>: call 0x12e7b40 <emacs_abort>
I didn't see this behavior on my Linux build of emacs due to the fact
that the .cold symbol was placed at an address that's lower than that
of the non-cold symbol.
With all of that said, I don't think it actually helps with the
backtrace problem. It appears that build_address_symbolic is only
used for disassembly. After poking around, however, I think the
equivalent bit of code may be found in find_frame_funname in stack.c.
The hunk of code which you'll want to disable is:
if (msymbol.minsym != NULL
&& (BMSYMBOL_VALUE_ADDRESS (msymbol)
> BLOCK_ENTRY_PC (SYMBOL_BLOCK_VALUE (func))))
{
/* We also don't know anything about the function besides
its address and name. */
func = 0;
funname.reset (xstrdup (MSYMBOL_PRINT_NAME (msymbol.minsym)));
*funlang = MSYMBOL_LANGUAGE (msymbol.minsym);
}
Doing this sort of thing - preferring the minimal symbol over the
the symbol might make sense for displaying assembly addresses - but
I'm not convinced that it makes sense for printing frame functions.
It would be possible to disable that code only when the frame pc
is in a block which is in discontiguous block. If someone thinks
this is a good idea, I can provide a patch, but someone else will
need to test it.
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-02 23:30 ` Kevin Buettner
@ 2019-05-04 8:30 ` Eli Zaretskii
2019-05-05 20:04 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-04 8:30 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Thu, 2 May 2019 16:30:15 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, simark@simark.ca
>
> #2 0x012e1f3b in print_vectorlike.cold () at print.c:1824
>
> This behavior is, apparently, intentional. The function
> build_address_symbolic in printcmd.c contains the following
> statements; in particular note the second comment:
>
> if (msymbol.minsym != NULL)
> {
> if (BMSYMBOL_VALUE_ADDRESS (msymbol) > name_location || symbol == NULL)
> {
> /* If this is a function (i.e. a code address), strip out any
> non-address bits. For instance, display a pointer to the
> first instruction of a Thumb function as <function>; the
> second instruction will be <function+2>, even though the
> pointer is <function+3>. This matches the ISA behavior. */
> if (MSYMBOL_TYPE (msymbol.minsym) == mst_text
> || MSYMBOL_TYPE (msymbol.minsym) == mst_text_gnu_ifunc
> || MSYMBOL_TYPE (msymbol.minsym) == mst_file_text
> || MSYMBOL_TYPE (msymbol.minsym) == mst_solib_trampoline)
> addr = gdbarch_addr_bits_remove (gdbarch, addr);
>
> /* The msymbol is closer to the address than the symbol;
> use the msymbol instead. */
> symbol = 0;
> name_location = BMSYMBOL_VALUE_ADDRESS (msymbol);
> if (do_demangle || asm_demangle)
> name_temp = MSYMBOL_PRINT_NAME (msymbol.minsym);
> else
> name_temp = MSYMBOL_LINKAGE_NAME (msymbol.minsym);
> }
> }
>
> When I traced it, prior to getting to the outer if statement,
> name_location contained the address of 'print_vectorlike' and
> name_temp was set to "print_vectorlike". So, if we were to
> *not* execute this code, we wouldn't use the minimal symbol.
>
> To test this, I disabled the code above via #if 0 ... #endif. Here's
> what I see when I do this:
>
> (gdb) x/i 0x12e1f36
> 0x12e1f36 <print_vectorlike+1096213>: call 0x12e7b40 <emacs_abort>
>
> I didn't see this behavior on my Linux build of emacs due to the fact
> that the .cold symbol was placed at an address that's lower than that
> of the non-cold symbol.
>
> With all of that said, I don't think it actually helps with the
> backtrace problem. It appears that build_address_symbolic is only
> used for disassembly. After poking around, however, I think the
> equivalent bit of code may be found in find_frame_funname in stack.c.
> The hunk of code which you'll want to disable is:
>
> if (msymbol.minsym != NULL
> && (BMSYMBOL_VALUE_ADDRESS (msymbol)
> > BLOCK_ENTRY_PC (SYMBOL_BLOCK_VALUE (func))))
> {
> /* We also don't know anything about the function besides
> its address and name. */
> func = 0;
> funname.reset (xstrdup (MSYMBOL_PRINT_NAME (msymbol.minsym)));
> *funlang = MSYMBOL_LANGUAGE (msymbol.minsym);
> }
>
> Doing this sort of thing - preferring the minimal symbol over the
> the symbol might make sense for displaying assembly addresses - but
> I'm not convinced that it makes sense for printing frame functions.
>
> It would be possible to disable that code only when the frame pc
> is in a block which is in discontiguous block. If someone thinks
> this is a good idea, I can provide a patch, but someone else will
> need to test it.
Thanks for the analysis.
Do you want me to test some patch?
And why doesn't the GNU/Linux executable have this minimal symbol in
the first place, btw?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-04 8:30 ` Eli Zaretskii
@ 2019-05-05 20:04 ` Kevin Buettner
2019-05-06 15:33 ` Eli Zaretskii
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Buettner @ 2019-05-05 20:04 UTC (permalink / raw)
To: gdb-patches; +Cc: Eli Zaretskii, simark
On Sat, 04 May 2019 11:30:14 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> Do you want me to test some patch?
See below.
> And why doesn't the GNU/Linux executable have this minimal symbol in
> the first place, btw?
It turns out that the GNU/Linux executable does have this symbol. The
difference is in its placement. The GNU/Linux executable placed the
.cold address before that of the entry pc for the function. The windows
executable places it after the entry pc. The code mentioned in my
earlier analysis prefers to use the minsym only when it's greater than
the entry pc.
My patch is below. If it works for you and the approach seems sound,
I'll work on a test case which doesn't depend on gcc placing the
.cold section after the entry pc.
- - -
Prefer symtab symbol over minsym for function names in discontiguous blocks
The discussion on gdb-patches which led to this patch may be found
here:
https://www.sourceware.org/ml/gdb-patches/2019-05/msg00018.html
Here's a brief synopsis/analysis:
Eli Zaretskii, while debugging a Windows emacs executable, found
that functions comprised of more than one (non-contiguous)
address range were not being displayed correctly in a backtrace. This
is the example that Eli provided:
(gdb) bt
#0 0x76a63227 in KERNELBASE!DebugBreak ()
from C:\Windows\syswow64\KernelBase.dll
#1 0x012e7b89 in emacs_abort () at w32fns.c:10768
#2 0x012e1f3b in print_vectorlike.cold () at print.c:1824
#3 0x011d2dec in print_object (obj=<optimized out>, printcharfun=XIL(0),
escapeflag=true) at print.c:2150
The function print_vectorlike consists of two address ranges, one of
which contains "cold" code which is expected to not execute very often.
There is a minimal symbol, print_vectorlike.cold.65, which is the address
of the "cold" range.
GDB is prefering this minsym over the the name provided by the
DWARF info due to some really old code in GDB which handles
"certain pathological cases". See the first big block comment
in find_frame_funname for more info.
I considered removing the code for this corner case entirely, but it
seems as though it might still be useful, so I left it intact.
That code is already disabled for inline functions. I added a
condition which disables it for non-contiguous functions as well.
gdb/ChangeLog:
* stack.c (find_frame_funname): Disable use of minsym for function
name in functions comprised of non-contiguous blocks.
---
gdb/stack.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/gdb/stack.c b/gdb/stack.c
index e5de10949d..66ba5ab43c 100644
--- a/gdb/stack.c
+++ b/gdb/stack.c
@@ -1067,9 +1067,18 @@ find_frame_funname (struct frame_info *frame, enum language *funlang,
struct bound_minimal_symbol msymbol;
- /* Don't attempt to do this for inlined functions, which do not
- have a corresponding minimal symbol. */
- if (!block_inlined_p (SYMBOL_BLOCK_VALUE (func)))
+ /* Don't attempt to do this for two cases:
+
+ 1) Inlined functions, which do not have a corresponding minimal
+ symbol.
+
+ 2) Functions which are comprised of non-contiguous blocks.
+ Such functions often contain a minimal symbol for a
+ "cold" range, i.e. code which is not expected to execute
+ very often. It is incorrect to use the minimal symbol
+ associated with this range. */
+ if (!block_inlined_p (SYMBOL_BLOCK_VALUE (func))
+ && BLOCK_CONTIGUOUS_P (SYMBOL_BLOCK_VALUE (func)))
msymbol
= lookup_minimal_symbol_by_pc (get_frame_address_in_block (frame));
else
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-05 20:04 ` Kevin Buettner
@ 2019-05-06 15:33 ` Eli Zaretskii
2019-05-06 16:24 ` Kevin Buettner
0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2019-05-06 15:33 UTC (permalink / raw)
To: Kevin Buettner; +Cc: gdb-patches, simark
> Date: Sun, 5 May 2019 13:04:03 -0700
> From: Kevin Buettner <kevinb@redhat.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, simark@simark.ca
>
> My patch is below. If it works for you and the approach seems sound,
> I'll work on a test case which doesn't depend on gcc placing the
> .cold section after the entry pc.
The patch fixes the backtrace, thanks.
The output from "info line" still references the .cold symbol,
though. I think this was not so on GNU/Linux? Is it also by sheer
luck, because of the order of the 'cold' symbols in the executable?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: The 'cold' function attribute and GDB
2019-05-06 15:33 ` Eli Zaretskii
@ 2019-05-06 16:24 ` Kevin Buettner
0 siblings, 0 replies; 30+ messages in thread
From: Kevin Buettner @ 2019-05-06 16:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: gdb-patches, simark
On Mon, 06 May 2019 18:33:37 +0300
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Sun, 5 May 2019 13:04:03 -0700
> > From: Kevin Buettner <kevinb@redhat.com>
> > Cc: Eli Zaretskii <eliz@gnu.org>, simark@simark.ca
> >
> > My patch is below. If it works for you and the approach seems sound,
> > I'll work on a test case which doesn't depend on gcc placing the
> > .cold section after the entry pc.
>
> The patch fixes the backtrace, thanks.
Thanks for testing it. I'll begin working on a test case.
> The output from "info line" still references the .cold symbol,
> though. I think this was not so on GNU/Linux? Is it also by sheer
> luck, because of the order of the 'cold' symbols in the executable?
The address printed by "info line" is constructed by
build_address_symbolic in printcmd.c. It contains code similar
to that in find_frame_funname.
You recall correctly - GNU/Linux did not display the .cold symbol when
using the "info line" command. The GNU/Linux executable that I
examined does have .cold symbols, but (at least at the examples I saw)
placed them at addresses lower than the corresponding function.
[ .cold symbols at lower addresses is also an interesting case, causing
all sorts of problems when I first started looking at the non-contiguous
range problem. GDB had assumed that the lowest address might also
be a start address. ]
I'll amend my patch to also address the "info line" problem too.
(Though for that one, it doesn't bother my quite as much to see the
minimal symbol.)
Kevin
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2019-05-06 16:24 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-01 18:59 The 'cold' function attribute and GDB Eli Zaretskii
2019-05-01 20:17 ` Simon Marchi
2019-05-02 2:51 ` Kevin Buettner
2019-05-02 6:59 ` Eli Zaretskii
2019-05-02 7:26 ` Kevin Buettner
2019-05-02 15:27 ` Eli Zaretskii
2019-05-02 15:59 ` Kevin Buettner
2019-05-02 16:46 ` Eli Zaretskii
2019-05-02 18:08 ` Simon Marchi
2019-05-02 18:47 ` Eli Zaretskii
2019-05-02 18:55 ` Kevin Buettner
2019-05-02 19:32 ` Eli Zaretskii
2019-05-02 19:51 ` Kevin Buettner
2019-05-02 7:06 ` Eli Zaretskii
2019-05-02 7:38 ` Kevin Buettner
2019-05-02 15:23 ` Eli Zaretskii
2019-05-02 15:56 ` Kevin Buettner
2019-05-02 16:43 ` Eli Zaretskii
2019-05-02 18:25 ` Kevin Buettner
2019-05-02 18:52 ` Eli Zaretskii
2019-05-02 19:13 ` Kevin Buettner
2019-05-02 19:28 ` Eli Zaretskii
2019-05-02 19:45 ` Kevin Buettner
2019-05-02 19:56 ` Eli Zaretskii
2019-05-02 23:30 ` Kevin Buettner
2019-05-04 8:30 ` Eli Zaretskii
2019-05-05 20:04 ` Kevin Buettner
2019-05-06 15:33 ` Eli Zaretskii
2019-05-06 16:24 ` Kevin Buettner
2019-05-02 15:17 ` Eli Zaretskii
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox