Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Cc: Andrew Burgess <aburgess@redhat.com>
Subject: [PATCH] gdb: fix edge case assertion from get_selected_frame
Date: Sat,  7 Mar 2026 14:10:19 +0000	[thread overview]
Message-ID: <8dd362133f03ae70dc917b4ae7570fa93bf9b584.1772892600.git.aburgess@redhat.com> (raw)

A cut down version of get_selected_frame looks like this:

  get_selected_frame (const char *message)
  {
    if (selected_frame == NULL)
      lookup_selected_frame (selected_frame_id, selected_frame_level);

    gdb_assert (selected_frame != NULL);
    return selected_frame;
  }

What this tells us is that lookup_selected_frame MUST ensure that
selected_frame is not NULL before it returns.  I ran into a case where
this guarantee is broken, lookup_selected_frame can leave selected_frame
as NULL, and the assertion in get_selected_frame triggers.

I started looking at this code while reviewing the debuginfod patches that
can be found here:

  https://inbox.sourceware.org/gdb-patches/20250124152858.2085187-3-amerey@redhat.com

This patch changes get_selected_frame to be:

  get_selected_frame (const char *message)
  {
    if (selected_frame == NULL)
      {
        lookup_selected_frame (selected_frame_id, selected_frame_level);
	if (selected_frame == NULL)
          lookup_selected_frame (selected_frame_id, selected_frame_level);
      }

    gdb_assert (selected_frame != NULL);
    return selected_frame;
  }

[ NOTE: The patch isn't exactly like that, but that's basically the idea,
    I've just simplified things here as the details are not important. ]

The problem that was encountered with the debuginfod patches is that, with
the changes in that series, it becomes more likely that
lookup_selected_frame can fail to set selected_frame, but, due to details
specific to that patch, the failure can only happen the first time
lookup_selected_frame is called.

It occurred to me during review, that, if I could find a way to replicate
that problem outside the specifics of debuginfod, then I could submit that
part of the patch separately, and this would reduce the size of the
debuginfod patch set, and make that patch easier to review.

So I looked for other ways that I could cause lookup_selected_frame to
fail.

In the end I wasn't able to truly reproduce the same issue that the
debuginfod patch has, which means I cannot post the above fix here.  But I
did find some bugs that I then fixed, and I though it would be worth
posting them anyway.

I'm aware that I'm really getting into some nasty edge cases here, doing
things that most users will never do, so these fixes are unlikely to ever
be of real value, but they aren't huge, and I figure it doesn't hurt to
cover these edge cases.

So, with that all said, what's the problem?  The selected frame is stored
as a frame level and frame-id.  The job of lookup_selected_frame is to
lookup the actual frame_info_ptr and call select_frame to set the global
selected_frame variable.

If the frame described by the level and frame-id cannot be found then
lookup_selected_frame should (if level > 0) give a warning and select the
innermost frame.  We don't give a warning if the frame we're trying to
select is #0 as the default action, selecting frame #0 is close enough,
even if it's a different frame #0 than the one we're expecting.

This fall back case, and the associated warning is not currently tested,
so this patch adds a test that triggers this case.  To do this we setup a
call stack, select a frame other than #0, then within a 'thread apply',
pop frames from the stack using 'return'.  When the 'thread apply'
completes GDB tries to restore the previously selected frame, but that
frame has now been popped from the stack and no longer exists.  GDB will
print the warning and select frame #0 instead.

This tests the warning, but still isn't enough to trigger the assertion.
To do that we have to go a little further.  The fall back code in
lookup_selected_frame looks like this (I've simplified this a little):

  /* Select the innermost stack frame.  */
  select_frame (get_current_frame ());

  /* Warn the user.  */
  if (frame_level > 0)
    {
      warning (_("Couldn't restore frame #%d in "
                 "current thread.  Bottom (innermost) frame selected:"),
               frame_level);

      print_stack_frame (get_selected_frame (NULL), 1, SRC_AND_LOC, 1);
    }
  }

It is the select_frame call which sets the selected_frame global.  If we
can find a way to set the back to NULL after the select_frame call then we
will trigger the assertion.

We already have a mechanism for setting selected_frame to NULL from Python
code, the gdb.invalidate_cached_frames function.  So the only remaining
question is, can we invoke a Python callback after the select_frame call?

And yes, we can.  Printing a frame will trigger the pretty printers.  Now,
in real code there's no reason at all that a pretty printer should ever
need to call gdb.invalidate_cached_frames, or call anything that needs to
then call gdb.invalidate_cached_frames, which is why I started off by
saying that this really is an edge case.  But, for a moment, lets just run
with it.

So we setup a pretty printer which invalidates the frame cache, setting
selected_frame back to NULL.  We ensure this is run when GDB prints the
fall back frame, and now we leave lookup_selected_frame with
selected_frame set to NULL, and the assertion triggers.

The fix is really easy.  Move the select_frame call after the
print_stack_frame call (outside the 'if' block).  The print_stack_frame
call needs to be updated to print the current frame rather than the
selected frame, as the current frame is no longer selected at the point we
are printing the frame now.

With this done the assertion is fixed.  There is just one theoretical bug
that I haven't tried to fix; if the pretty printer changed the machine
state then it is possible that the current frame that's selected is not
the current frame that was printed.  This could be confusing, but not I
think fatal.  Trying to fix this felt like taking things too far, so I've
left this for now.

While I was here I changed the wording of the warning message.  The old
warning included the words "Bottom (innermost) frame selected", but our
manual doesn't really use the word "bottom" to describe the innermost
frame, we just say innermost.  So I updated the text to say "Innermost
frame selected".  Given we then print the stack frame, which includes the
"#0" tag, I think this is all perfectly clear, and lines up with the
wording in our documentation.

Having written all of the above I realised I could extend the test to
cover an additional, but related case.  Using the same setup, I run
the inferior to a point where the stack has 5 frames in it.  I select
frame #2 just as before, but now, instead of popping all frames
including frame #2, I just pop frames #0 and #1.  The logic in
lookup_selected_frame first tries to find the frame at the expected
level, in this case #2.  But after popping the frames, the frame at
level 2 will not be the expected frame.  But lookup_selected_frame
already handles this, and if the find frame by level logic fails, we
instead search for the frame by frame-id.  And this is what allows
this additional test to pass, the previous frame #2 has now become
frame #0, and this frame is selected as expected.
---
 gdb/frame.c                                   |  23 ++-
 gdb/testsuite/gdb.base/bad-frame-selection.c  |  55 ++++++
 .../gdb.base/bad-frame-selection.exp          | 186 ++++++++++++++++++
 gdb/testsuite/gdb.base/bad-frame-selection.py |  51 +++++
 4 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/bad-frame-selection.c
 create mode 100644 gdb/testsuite/gdb.base/bad-frame-selection.exp
 create mode 100644 gdb/testsuite/gdb.base/bad-frame-selection.py

diff --git a/gdb/frame.c b/gdb/frame.c
index 8cb1d0a5c42..8644e029e61 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -1874,21 +1874,30 @@ lookup_selected_frame (struct frame_id a_frame_id, int frame_level)
       return;
     }
 
-  /* Nothing else to do, the frame layout really changed.  Select the
-     innermost stack frame.  */
-  select_frame (get_current_frame ());
+  /* We are unable to restore the required frame, so instead we'll
+     select the current (innermost) frame.  Do this before actually
+     setting the frame as print_stack_frame can make calls into
+     extension language hooks, which could invalidate the frame cache,
+     which will clear the selected frame.
 
-  /* Warn the user.  */
+     We only warn the user if we're trying to select something other
+     than frame #0 though, as the fallback is to just select the
+     current frame #0, even if it's different to the frame #0 we tried
+     to find (e.g. the frame-id changed).  */
   if (frame_level > 0 && !current_uiout->is_mi_like_p ())
     {
-      warning (_("Couldn't restore frame #%d in "
-		 "current thread.  Bottom (innermost) frame selected:"),
+      warning (_("Couldn't restore frame #%d in current thread.  "
+		 "Innermost frame selected:"),
 	       frame_level);
       /* For MI, we should probably have a notification about current
 	 frame change.  But this error is not very likely, so don't
 	 bother for now.  */
-      print_stack_frame (get_selected_frame (NULL), 1, SRC_AND_LOC, 1);
+      print_stack_frame (get_current_frame (), 1, SRC_AND_LOC, 1);
     }
+
+  /* We couldn't find the frame we were looking for, so just restore
+     the innermost frame instead.  */
+  select_frame (get_current_frame ());
 }
 
 bool
diff --git a/gdb/testsuite/gdb.base/bad-frame-selection.c b/gdb/testsuite/gdb.base/bad-frame-selection.c
new file mode 100644
index 00000000000..385621560a1
--- /dev/null
+++ b/gdb/testsuite/gdb.base/bad-frame-selection.c
@@ -0,0 +1,55 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2026 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+volatile int dummy;
+
+struct foo { int a, b; };
+
+void
+func_d (void)
+{
+  dummy = 4;
+}
+
+void
+func_c (void)
+{
+  func_d ();
+  dummy = 3;
+}
+
+void
+func_b (void)
+{
+  func_c ();
+  dummy = 2;
+}
+
+void
+func_a (struct foo obj)
+{
+  func_b ();
+  dummy = obj.a + obj.b;
+}
+
+int
+main (void)
+{
+  struct foo obj = { 1, 2 };
+  func_a (obj);
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/bad-frame-selection.exp b/gdb/testsuite/gdb.base/bad-frame-selection.exp
new file mode 100644
index 00000000000..0852df877b0
--- /dev/null
+++ b/gdb/testsuite/gdb.base/bad-frame-selection.exp
@@ -0,0 +1,186 @@
+# Copyright 2026 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that GDB handles failed frame restoration correctly.
+#
+# When a scoped_restore_selected_frame saves a frame at level > 0 and the
+# stack changes within the scope, lookup_selected_frame cannot find the
+# saved frame and falls back to selecting the innermost frame with a
+# warning.  This test exercises that warning path.
+#
+# Additionally, the warning path calls print_stack_frame to display the
+# fallback frame, which can invoke Python pretty printers on the frame's
+# arguments.  If a pretty printer calls gdb.invalidate_cached_frames
+# during this print, it resets selected_frame to NULL, which can cause an
+# assertion failure in get_selected_frame.  This test validates that this
+# bug has been fixed.
+
+load_lib gdb-python.exp
+
+require allow_python_tests
+
+standard_testfile
+
+if {[build_executable "failed to build" $testfile $srcfile]} {
+    return
+}
+
+set remote_python_file [gdb_remote_download host \
+			    ${srcdir}/${subdir}/${testfile}.py]
+
+# Run global TESTFILE until we have a stack with 5 frames.  Select a
+# frame other than #0 and then use 'thread apply' and the 'return'
+# command to pop frames from the stack.  When POP_SELECTED is true the
+# previously selected frame is popped, otherwise, all frames up to,
+# but not including, the selected frame are popped.
+#
+# When the 'thread apply' completes it should try to restore the
+# previously selected frame.  If that frame has now been destroyed
+# (popped from the stack), GDB should give a warning and select frame
+# #0 instead.  If the frame was not popped (POP_SELECTED was false)
+# then GDB should still find, and restore the previously selected
+# frame.
+#
+# When WITH_PRETTY_PRINTER is true, load a pretty printer which
+# triggers when the fallback frame #0 is selected.  This pretty
+# printer invalidates the frame cache which used to trigger an
+# assertion within GDB.  Setting WITH_PRETTY_PRINTER to true and
+# POP_SELECTED to false isn't wrong, but isn't going to test anything
+# interesting.
+proc run_test { with_pretty_printer pop_selected } {
+    # The pretty printer is only needed if we are planning to pop the
+    # selected frame and GDB should fall back to selecting func_a,
+    # which uses the pretty printer.
+    if { $with_pretty_printer && !$pop_selected } {
+	perror "invalid argument combination to run_test"
+    }
+
+    clean_restart $::testfile
+
+    if { $with_pretty_printer } {
+	gdb_test_no_output "source ${::remote_python_file}" \
+	    "source ${::testfile}.py"
+	set obj_re "a=<1> b=<2>"
+    } else {
+	set obj_re "\\.\\.\\."
+    }
+
+    # Run to the breakpoint in func_d.  At this point the stack is:
+    #
+    #   #0  func_d ()
+    #   #1  func_c ()
+    #   #2  func_b ()
+    #   #3  func_a (obj=???)
+    #   #4  main
+    #
+    # Notice that 'func_a' has an argument 'obj', this is of type
+    # 'struct foo'.  When WITH_PRETTY_PRINTER is true this will be
+    # printed using a pretty printer, otherwise, as this is a struct,
+    # the representation of obj will be replaced with '...'.
+    if { ![runto func_d] } {
+	return
+    }
+
+    # Validate the stack is as expected.
+    gdb_test "bt" \
+	[multi_line \
+	     "#0 \[^\r\n\]*func_d \\(\\) \[^\r\n\]*" \
+	     "#1 \[^\r\n\]*func_c \\(\\) \[^\r\n\]*" \
+	     "#2 \[^\r\n\]*func_b \\(\\) \[^\r\n\]*" \
+	     "#3 \[^\r\n\]*func_a \\(obj=${obj_re}\\) \[^\r\n\]*" \
+	     "#4 \[^\r\n\]*main \[^\r\n\]*"] \
+	"backtrace at func_d"
+
+    # Select a frame other than frame #0.  The frame we select here
+    # will be popped from the call stack by all the 'return' calls
+    # below.
+    gdb_test "frame 2" "#2 .*func_b.*" "select frame 2"
+
+    if { $with_pretty_printer } {
+	# Set this Python flag so that the next time the pretty printer
+	# prints a 'struct foo' it will call gdb.invalidate_cached_frames.
+	#
+	# If this flag didn't exist, and the pretty printer always
+	# invalidated the frame cache, then printing the backtrace above
+	# would fail as the frame cache would become invalid while walking
+	# the stack.
+	#
+	# Set this flag as late as possible just before we do the 'thread
+	# apply' and 'return' trick below.
+	gdb_test_no_output "python invalidate = True"
+	set frame_regexp "#0 \[^\r\n\]*func_a \\(obj=<invalidate-frame>\[^\r\n\]*\\).*"
+    } elseif { $pop_selected } {
+	set frame_regexp "#0 \[^\r\n\]*func_a.*"
+    }
+
+    if { $pop_selected } {
+	# Using "thread apply 1" creates a scoped_restore_current_thread
+	# within GDB, which saves and later restores the current thread and
+	# frame.  The saved frame state is (level=2, frame-id=func_b).
+	#
+	# Inside the scope, switch_to_thread calls reinit_frame_cache which
+	# resets selected_frame_level to -1.  Each "return" therefore returns
+	# from frame 0 (the innermost frame), popping one frame per call:
+	#
+	#   return 1: pops func_d  -> stack: main, func_a, func_b, func_c
+	#   return 2: pops func_c  -> stack: main, func_a, func_b
+	#   return 3: pops func_b  -> stack: main, func_a
+	#
+	# When scoped_restore_current_thread's destructor runs, it tries to
+	# restore the previously selected frame, which was level 2, func_b.
+	# Due to the 'return' calls there is no level 2 frame, nor can we find
+	# a frame with the correct frame-id that matches the previous func_b
+	# frame.
+	#
+	# And so GDB will give a warning indicating that it failed to restore
+	# the previously selected frame, and then selects (and prints) the
+	# innermost frame.
+	#
+	# When the pretty printer is in use, printing the inner most frame,
+	# which is for func_a, will call the pretty printer, which will then
+	# invalidate the frame cache.  Doing this sets the currently selected
+	# frame back to NULL.  If GDB selects the inner most frame before
+	# printing the frame, then, after the pretty printer has run, there
+	# will no longer be a selected frame.  This can trigger an assertion.
+	#
+	# The fix, which is now in GDB, is that we select the frame only after
+	# printing the frame.  This ensures that lookup_selected_frame always
+	# leaves a frame selected.
+	gdb_test "thread apply 1 python for i in range(3): gdb.execute('return')" \
+	    [multi_line \
+		 "warning: Couldn't restore frame #2 in current thread\\.  Innermost frame selected:" \
+		 $frame_regexp] \
+	    "pop three frames via thread apply"
+    } else {
+	gdb_test "thread apply 1 python for i in range(2): gdb.execute('return')" \
+	    [multi_line \
+		 "" \
+		 "Thread 1\[^\r\n\]*"] \
+	    "pop two frames via thread apply"
+
+	gdb_test "bt" \
+	    [multi_line \
+		 "#0 \[^\r\n\]*func_b \\(\\) \[^\r\n\]*" \
+		 "#1 \[^\r\n\]*func_a \\(obj=${obj_re}\\) \[^\r\n\]*" \
+		 "#2 \[^\r\n\]*main \[^\r\n\]*"] \
+	    "backtrace at func_b after popping frames"
+    }
+}
+
+foreach_with_prefix with_pp { false true } {
+    run_test $with_pp true
+}
+
+run_test false false
diff --git a/gdb/testsuite/gdb.base/bad-frame-selection.py b/gdb/testsuite/gdb.base/bad-frame-selection.py
new file mode 100644
index 00000000000..f2ad394cb76
--- /dev/null
+++ b/gdb/testsuite/gdb.base/bad-frame-selection.py
@@ -0,0 +1,51 @@
+# Copyright 2026 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb.printing
+
+# When set to True the pretty printer will invalidate the frame cache.
+invalidate = False
+
+
+class FooPrinter(gdb.ValuePrinter):
+    """Pretty printer for struct foo that optionally invalidates
+    the frame cache, to test that GDB handles frame cache
+    invalidation from within a pretty printer."""
+
+    def __init__(self, val):
+        self.__val = val
+
+    def to_string(self):
+        global invalidate
+
+        if invalidate:
+            gdb.invalidate_cached_frames()
+            invalidate = False
+            prefix = "<invalidate-frame>"
+        else:
+            prefix = ""
+
+        return (
+            prefix + "a=<" + str(self.__val["a"]) + "> b=<" + str(self.__val["b"]) + ">"
+        )
+
+
+def build_pretty_printer():
+    pp = gdb.printing.RegexpCollectionPrettyPrinter("bad-frame-selection")
+    pp.add_printer("foo", "^foo$", FooPrinter)
+    return pp
+
+
+gdb.printing.register_pretty_printer(gdb.current_objfile(), build_pretty_printer())

base-commit: f08ffbbf2691bad2d5df660ee644647687775f0c
-- 
2.25.4


             reply	other threads:[~2026-03-07 14:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-07 14:10 Andrew Burgess [this message]
2026-03-12 18:53 ` Kevin Buettner
2026-03-20 16:25   ` Andrew Burgess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8dd362133f03ae70dc917b4ae7570fa93bf9b584.1772892600.git.aburgess@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox