Re: [PATCHv2] gdb/unwinders: better support for $pc not saved

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Cc: Keith Seitz <keiths@redhat.com>, Kevin Buettner <kevinb@redhat.com>
Subject: Re: [PATCHv2] gdb/unwinders: better support for $pc not saved
Date: Tue, 05 Mar 2024 16:41:27 +0000	[thread overview]
Message-ID: <87a5ncbphk.fsf@redhat.com> (raw)
In-Reply-To: <2137e3edccdc997be04f3235718d772b5be97727.1706885900.git.aburgess@redhat.com>


Ping!

If there's no further feedback then I plan to push this patch in a week
or so.

Thanks,
Andrew



Andrew Burgess <aburgess@redhat.com> writes:

> In V2 I've addressed the test failure that Keith Pointed out.  The
> only changes between V1 and V2 are in the testsuite, the GDB side of
> things is unchanged.
>
> The problem is, I suspect caused by an S390 compiler bug, though I'm
> not 100% sure, maybe I'm just missing something.  If there is a bug
> then it's a weird little issue related to debug information
> generation, not a functionality bug.
>
> The test as originally written created a Python unwinder which claimed
> any frame for the function break_bt_here.  The frame-id the unwinder
> creates for the break_bt_here frame is just $pc and $sp value unwound
> from the next frame.
>
> This of course is a bad frame-id.  A frame-id should remain the same
> for every address within a function, which is why GDB usually uses the
> $pc for the start of a function.
>
> Still, for the sake of this test it didn't really matter.  Or so I
> thought.
>
> The failure Keith pointed out only happened when we had a stack like
> this:
>
>   main -> break_bt_here -> other_func
>
> My Python unwinder would claim the break_bt_here frame and create a
> frame-id by unwinding the $pc and $sp values from `other_func`.
>
> As other_func is a trivial (empty) leaf function, not every
> architecture is going to need to allocate a stack frame for this
> function.  For example, on S390 the return address is placed into r14
> and not onto the stack.
>
> As such, my expectation, for a stack grows down architecture, is that
> the $sp value in other_func would be equal to, or less than the $sp
> value in break_bt_here.
>
> In my experience the DWARF Call Frame Address (CFA) is usually the
> stack pointer value on entry to a function, so my expectation was that
> the CFA of other_func would be equal to, or less than the unwound $sp
> value in break_bt_here, so (I thought) I'd be fine to use the unwound
> $sp value as the frame-id stack address in break_bt_here.
>
> However, on S390, the DWARF CIA that covers other_func looks like this:
>
>   00000000 0000000000000014 00000000 CIE
>     Version:               1
>     Augmentation:          "zR"
>     Code alignment factor: 1
>     Data alignment factor: -8
>     Return address column: 14
>     Augmentation data:     1b
>     DW_CFA_def_cfa: r15 ofs 160
>     DW_CFA_undefined: r14
>     DW_CFA_nop
>
> While the FDE for other_func is:
>
>   00000070 0000000000000018 00000048 FDE cie=0000002c pc=00000000010011a0..00000000010011b0
>     DW_CFA_advance_loc: 4 to 00000000010011a4
>     DW_CFA_register: r11 in r16 (f0)
>     DW_CFA_advance_loc: 4 to 00000000010011a8
>     DW_CFA_def_cfa_register: r11
>     DW_CFA_advance_loc: 6 to 00000000010011ae
>     DW_CFA_restore: r11
>     DW_CFA_def_cfa_register: r15
>
> Notice the 'DW_CFA_def_cfa: r15 ofs 160' in the CIE.  This sets up a
> default offset of 160 bytes for the CFA.
>
> I don't believe there's anything really "wrong" with this, but it does
> seem weird.  It means that the stack-address within the frame-id for
> other_func will be 160 bytes higher than the $sp value on entry to the
> function other_func.
>
> Now this isn't normally a problem as break_bt_here, when using the
> DWARF unwinder rather than my custom Python unwinder will also have a
> 160 byte offset, so the frame-id for break_bt_here will still appear
> to be for an earlier stack frame.
>
> However, my custom Python unwinder doesn't understand this "magic" 160
> byte offset, and instead just uses the unwound stack value.
>
> Which means that when I have a stack like this:
>
>   main -> break_bt_here -> other_func
>
> The stack address in the frame-id for break_bt_here is actually less
> than the stack address in the frame-id for other_func.  This then
> triggers frame_id_inner check within get_prev_frame_always_1 and GDB
> things something has gone wrong with the stack unwind, leading to the
> failure Keith reported.
>
> My solution to this problem is to run up to break_bt_here without the
> Python unwinder loaded.  Use 'maint print frame-id' to capture the
> frame-id generated from the DWARF unwinder, and then restart GDB and
> load the Python unwinder.
>
> I can then tell the Python unwinder the contents of the DWARF generate
> frame-id, and the Python unwinder will report this as its frame-id.
> For x86-64 very little changes.  But for S390 I now use a stack
> address that includes the "magic" 160 byte offset, and everything is
> fine.
>
> ---
>
> This started with a Red Hat bug report which can be seen here:
>
>   https://bugzilla.redhat.com/show_bug.cgi?id=1850710
>
> The problem reported here was using GDB on GNU/Linux for S390, the
> user stepped into JIT generated code.  As they enter the JIT code GDB
> would report 'PC not saved', and this same message would be reported
> after each step/stepi.
>
> Additionally, the user had 'set disassemble-next-line on', and once
> they entered the JIT code this output was not displayed, nor were any
> 'display' directives displayed.
>
> The user is not making use of the JIT plugin API to provide debug
> information.  But that's OK, they aren't expecting any source level
> debug here, they are happy to use 'stepi', but the missing 'display'
> directives are a problem, as is the constant 'PC not saved' (error)
> message.
>
> What is happening here is that as GDB is failing to find any debug
> information for the JIT generated code, it is falling back on to the
> S390 prologue unwinder to try and unwind frame #0.  Unfortunately,
> without being able to identify the function boundaries, the S390
> prologue scanner can't help much, in fact, it doesn't even suggest an
> arbitrary previous $pc value (some targets that use a link-register
> will, by default, assume the link-register contains the previous $pc),
> instead the S390 will just say, "sorry, I have no previous $pc value".
>
> The result of this is that when GDB tries to find frame #1 we end
> throwing an error from frame_unwind_pc (the 'PC not saved' error).
> This error is not caught anywhere except at the top-level interpreter
> loop, and so we end up skipping all the 'display' directive handling.
>
> While thinking about this, I wondered, could I trigger the same error
> using the Python Unwinder API?  What happens if a Python unwinder
> claims a frame, but then fails to provide a previous $pc value?
>
> Turns out that exactly the same thing happens, which is great, as that
> means we now have a way to reproduce this bug on any target.  And so
> the test included with this patch does just this.  I have a Python
> unwinder that claims a frame, but doesn't provide any previous
> register values.
>
> I then do two tests, first I stop in the claimed frame (i.e. frame #0
> is the frame that can't be unwound), I perform a few steps, and check
> the backtrace.  And second, I stop in a child of the problem
> frame (i.e. frame #1 is the frame that can't be unwound), and from
> here I check the backtrace.
>
> While all this is going on I have a 'display' directive in place, and
> each time GDB stops I check that the display directive triggers.
>
> Additionally, when checking the backtrace, I am checking that the
> backtrace finishes with the message 'Backtrace stopped: frame did not
> save the PC'.
>
> As for the fix I chose to add a call to frame_unwind_pc directly to
> get_prev_frame_always_1.  Calling frame_unwind_pc will cache the
> unwound $pc value, so this doesn't add much additional work as
> immediately after the new frame_unwind_pc call, we call
> get_prev_frame_maybe_check_cycle, which actually generates the
> previous frame, which will always (I think) require a call to
> frame_unwind_pc anyway.
>
> The reason for adding the frame_unwind_pc call into
> get_prev_frame_always_1, is that if the frame_unwind_pc call fails we
> want to set the frames 'stop_reason', and get_prev_frame_always_1
> seems to be the place where this is done, so I wanted to keep the new
> stop_reason setting code next to all the existing stop_reason setting
> code.
>
> Additionally, once we enter get_prev_frame_maybe_check_cycle we
> actually create the previous frame, then, if it turns out that the
> previous frame can't be created we need to remove the frame .. this
> seemed more complex than just making the check in
> get_prev_frame_always_1.
>
> With this fix in place the original S390 bug is fixed, and also the
> test added in this commit, that uses the Python API, is also fixed.
>
> Reviewed-By: Kevin Buettner <kevinb@redhat.com>
> ---
>  gdb/frame.c                             |  32 +++++++
>  gdb/testsuite/gdb.base/pc-not-saved.c   |  48 ++++++++++
>  gdb/testsuite/gdb.base/pc-not-saved.exp | 113 ++++++++++++++++++++++++
>  gdb/testsuite/gdb.base/pc-not-saved.py  |  71 +++++++++++++++
>  4 files changed, 264 insertions(+)
>  create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.c
>  create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.exp
>  create mode 100644 gdb/testsuite/gdb.base/pc-not-saved.py
>
> diff --git a/gdb/frame.c b/gdb/frame.c
> index fae89cbbc0a..9ed4515d4d5 100644
> --- a/gdb/frame.c
> +++ b/gdb/frame.c
> @@ -2419,6 +2419,38 @@ get_prev_frame_always_1 (frame_info_ptr this_frame)
>  	}
>      }
>  
> +  /* Ensure we can unwind the program counter of THIS_FRAME.  */
> +  try
> +    {
> +      /* Calling frame_unwind_pc for the sentinel frame relies on the
> +	 current_frame being set, which at this point it might not be if we
> +	 are in the process of setting the current_frame after a stop (see
> +	 get_current_frame).
> +
> +	 The point of this check is to ensure that the unwinder for
> +	 THIS_FRAME can actually unwind the $pc, which we assume the
> +	 sentinel frame unwinder can always do (it's just a read from the
> +	 machine state), so we only call frame_unwind_pc for frames other
> +	 than the sentinel (level -1) frame.
> +
> +	 Additionally, we don't actually care about the value of the
> +	 unwound $pc, just that the call completed successfully.  */
> +      if (this_frame->level >= 0)
> +	frame_unwind_pc (this_frame);
> +    }
> +  catch (const gdb_exception_error &ex)
> +    {
> +      if (ex.error == NOT_AVAILABLE_ERROR || ex.error == OPTIMIZED_OUT_ERROR)
> +	{
> +	  frame_debug_printf ("  -> nullptr // no saved PC");
> +	  this_frame->stop_reason = UNWIND_NO_SAVED_PC;
> +	  this_frame->prev = nullptr;
> +	  return nullptr;
> +	}
> +
> +      throw;
> +    }
> +
>    return get_prev_frame_maybe_check_cycle (this_frame);
>  }
>  
> diff --git a/gdb/testsuite/gdb.base/pc-not-saved.c b/gdb/testsuite/gdb.base/pc-not-saved.c
> new file mode 100644
> index 00000000000..bc6632a97d7
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/pc-not-saved.c
> @@ -0,0 +1,48 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2024 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +volatile int global_var = 0;
> +
> +void
> +other_func (void)
> +{
> +  /* Nothing.  */
> +}
> +
> +void
> +break_bt_here (void)
> +{
> +  /* This is all nonsense; just filler so this function has a body.  */
> +  if (global_var != 99)
> +    global_var++;
> +  if (global_var != 98)
> +    global_var++;
> +  if (global_var != 97)
> +    global_var++;
> +  if (global_var != 96)
> +    global_var++;
> +  other_func ();
> +  if (global_var != 95)
> +    global_var++;
> +}
> +
> +int
> +main (void)
> +{
> +  break_bt_here ();
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.base/pc-not-saved.exp b/gdb/testsuite/gdb.base/pc-not-saved.exp
> new file mode 100644
> index 00000000000..f267a269f1a
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/pc-not-saved.exp
> @@ -0,0 +1,113 @@
> +# Copyright 2024 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test how GDB handles a frame in which the previous-pc value is not
> +# available.  Specifically, check that the backtrace correctly reports
> +# why the backtrace is truncated, and ensure that 'display' directives
> +# still work when 'stepi'-ing through the frame.
> +#
> +# We do this by registering a Python unwinder which doesn't provide
> +# any previous register values.
> +
> +require allow_python_tests
> +
> +standard_testfile
> +
> +if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } {
> +    return
> +}
> +
> +set remote_python_file \
> +    [gdb_remote_download host "${srcdir}/${subdir}/${testfile}.py"]
> +
> +if { ![runto "break_bt_here"] } {
> +    return
> +}
> +
> +# Figuring out the correct frame-id from a Python unwinder is hard.
> +# We need to know the function's start address (not too hard), and the
> +# stack address on entry to the function, which is much harder to
> +# figure out in a cross-target way.
> +#
> +# So instead we run without any Python unwinder in place and use
> +# 'maint print frame-id' to record the frame-id.  We then restart GDB,
> +# load the Python unwinder, and tell it to use the frame-id we
> +# recorded here.
> +set pc unknown
> +set cfa unknown
> +gdb_test_multiple "maintenance print frame-id" "store break_bt_here frame-id" {
> +    -re -wrap "frame-id for frame #0: \\{stack=($hex),code=($hex),\[^\}\]+\\}" {
> +	set cfa $expect_out(1,string)
> +	set pc $expect_out(1,string)
> +    }
> +}
> +gdb_assert { ![string equal $cfa unknown] } \
> +    "check we read the frame's CFA"
> +
> +gdb_assert { ![string equal $pc unknown] } \
> +    "check we read the frame's PC"
> +
> +# Restart and load the Python unwinder script.
> +clean_restart $binfile
> +gdb_test_no_output "source ${remote_python_file}" "load python file"
> +
> +# Tell the Python unwinder to use the frame-id we cached above.
> +gdb_test_no_output "python set_break_bt_here_frame_id($pc, $cfa)"
> +
> +# Run up to the function which the unwinder will claim.
> +if { ![runto "break_bt_here"] } {
> +    return
> +}
> +
> +# Print the backtrace.  Check that the reason for stopping the
> +# backtrace is that the previous $pc is not available.
> +gdb_test "bt" \
> +    [multi_line \
> +	 "^#0  break_bt_here \\(\\) at \[^\r\n\]+" \
> +	 "Backtrace stopped: frame did not save the PC"] \
> +    "backtrace from break_bt_here function"
> +
> +# Ensure we can stepi.
> +gdb_test "stepi" \
> +    "(:?$hex\\s+)?$decimal\\s+\[^\r\n\]+" \
> +    "stepi without a display in place"
> +
> +# Setup a 'display' directive.
> +gdb_test "display/i \$pc" \
> +    [multi_line \
> +	 "^1: x/i \\\$pc" \
> +	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"]
> +
> +# Step again, check the 'display' directive is shown.
> +gdb_test "stepi" \
> +    [multi_line \
> +	 "(:?$hex\\s+)?$decimal\\s+\[^\r\n\]+" \
> +	 "1: x/i \\\$pc" \
> +	 "=> $hex <break_bt_here(:?\\+$decimal)?>:\\s+\[^\r\n\]+"] \
> +    "stepi with a display in place"
> +
> +# Continue to a function that is called from within break_bt_here.
> +# The Python unwinder will then be claiming frame #1.
> +gdb_breakpoint other_func
> +gdb_continue_to_breakpoint "continue to other_func"
> +
> +# Print the backtrace and check that the reason for stopping the
> +# backtrace is that the previous $pc is not available.
> +gdb_test "bt" \
> +    [multi_line \
> +	 "#0  other_func \\(\\) at \[^\r\n\]+" \
> +	 "#1  (:?$hex in )?break_bt_here \\(\\) at \[^\r\n\]+" \
> +	 "Backtrace stopped: frame did not save the PC"] \
> +    "backtrace from other_func function"
> diff --git a/gdb/testsuite/gdb.base/pc-not-saved.py b/gdb/testsuite/gdb.base/pc-not-saved.py
> new file mode 100644
> index 00000000000..65a8e764885
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/pc-not-saved.py
> @@ -0,0 +1,71 @@
> +# Copyright (C) 2024 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +import gdb
> +from gdb.unwinder import Unwinder, FrameId
> +
> +# Cached FrameId.  See set_break_bt_here_frame_id for details.
> +break_bt_here_frame_id = None
> +
> +
> +def set_break_bt_here_frame_id(pc, cfa):
> +    """Call this to pre-calculate the FrameId for the frame our unwinder
> +    is going to claim, this avoids us having to actually figure out a
> +    frame-id within the unwinder, something which is going to be hard
> +    to do in a cross-target way.
> +
> +    Instead we first run the test without the Python unwinder in
> +    place, use 'maint print frame-id' to record the frame-id, then,
> +    after loading this Python script, we all this function to record
> +    the frame-id that the unwinder should use."""
> +    global break_bt_here_frame_id
> +    break_bt_here_frame_id = FrameId(cfa, pc)
> +
> +
> +class break_unwinding(Unwinder):
> +
> +    """An unwinder for the function 'break_bt_here'.  This unwinder will
> +    claim any frame for the function in question, but doesn't provide
> +    any unwound register values.  Importantly, we don't provide a
> +    previous $pc value, this means that if we are stopped in
> +    'break_bt_here' then we should fail to unwind beyond frame #0."""
> +
> +    def __init__(self):
> +        Unwinder.__init__(self, "break unwinding")
> +
> +    def __call__(self, pending_frame):
> +        pc_desc = pending_frame.architecture().registers().find("pc")
> +        pc = pending_frame.read_register(pc_desc)
> +
> +        if pc.is_optimized_out:
> +            return None
> +
> +        block = gdb.block_for_pc(pc)
> +        if block == None:
> +            return None
> +        func = block.function
> +        if func == None:
> +            return None
> +        if str(func) != "break_bt_here":
> +            return None
> +
> +        global break_bt_here_frame_id
> +        if break_bt_here_frame_id is None:
> +            return None
> +
> +        return pending_frame.create_unwind_info(break_bt_here_frame_id)
> +
> +
> +gdb.unwinder.register_unwinder(None, break_unwinding(), True)
>
> base-commit: 05d1b4b4ad7d74a64cc71c53d621241fc393fcb6
> -- 
> 2.25.4

next prev parent reply	other threads:[~2024-03-05 16:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-27 15:26 [PATCH] " Andrew Burgess
2024-01-28 20:56 ` Kevin Buettner
2024-01-29 16:49 ` Keith Seitz
2024-02-02 15:19 ` [PATCHv2] " Andrew Burgess
2024-03-05 16:41   ` Andrew Burgess [this message]
2024-03-07 14:43     ` Keith Seitz
2024-03-11 10:07       ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5ncbphk.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=keiths@redhat.com \
    --cc=kevinb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox