From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16870 invoked by alias); 13 Mar 2012 01:39:44 -0000 Received: (qmail 16855 invoked by uid 22791); 13 Mar 2012 01:39:42 -0000 X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL,BAYES_00,SARE_TOWRITE,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 13 Mar 2012 01:39:27 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id 806191C6916 for ; Mon, 12 Mar 2012 21:39:26 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id ycjT6ndhOl5R for ; Mon, 12 Mar 2012 21:39:26 -0400 (EDT) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id 3C8E91C6654 for ; Mon, 12 Mar 2012 21:39:26 -0400 (EDT) Received: by joel.gnat.com (Postfix, from userid 1000) id 4AFAA145615; Mon, 12 Mar 2012 18:39:20 -0700 (PDT) From: Joel Brobecker To: gdb-patches@sourceware.org Subject: Problem after hitting breakpoint on Windows (with GDBserver) Date: Tue, 13 Mar 2012 01:39:00 -0000 Message-Id: <1331602756-23567-1-git-send-email-brobecker@adacore.com> Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2012-03/txt/msg00424.txt.bz2 Hello, This is a patch that fixes a problem that started appearing on Feb 24th after support for target-side condition evaluation got added. The problem occurs on Windows when using GDB+GDBserver. I decided to write a long-ish introductory message rather than putting that information with the patch, because it's a lot of details which I was afraid might dilute a bit the important piece for the patch email. You may just look at the patch email itself, where the revision history should provide the minimum amount of info to understand the problem. In a nutshell, GDB uses memory read/writes to insert/remove breakpoints, and breakpoint restoration fails. This eventually leads to this kind of problem after hitting a breakpoint: (gdb) b foo Breakpoint 1 at 0x40177e: file foo.adb, line 5. (gdb) cont Continuing. Breakpoint 1, foo () at foo.adb:5 5 Put_Line ("Hello World."); -- STOP (gdb) n Program received signal SIGSEGV, Segmentation fault. 0x00401782 in foo () at foo.adb:5 5 Put_Line ("Hello World."); -- STOP Here is what's happening, in chonological order: . When the user types "cont", our breakpoint gets inserted. Contrary to GNU/Linux systems, the z0/Z0 packets are not supported, so we use memory read/writes instead. So, mem-break.c:default_memory_insert_breakpoint reads the (piece of) instruction that will be replaced by our breakpoint, saves that in the breakpoint location's shadow_contents buffer, and then writes the breakpoint instruction. So far, so good. . GDB receives a new-shared-library event from the kernel. GDB processes it, calling solib_add, which calls breakpoint_re_set. Basically, new shared library => we need to re-eval our breakpoints, to see if something changed. Unfortunately, part of the process involves update_breakpoint_locations replacing the breakpoint's bp_loc chain by an entirely new chain, as so: b->loc = NULL [...] for (all sals) [...] new_loc = add_location_to_breakpoint (b, &(sals.sals[i])); A side-effect of this is the fact that the location's condition_changed flag got reset to condition_modified (see init_bp_location's call to mark_breakpoint_location_modified). . As a result of this, update_global_location_list sets the new location's needs_update flag. (it also makes sure that the new location's "inserted" flag is set again). . This eventually leads to insert_bp_location trying to re-insert the breakpoint... . That's when things get interesting: The first thing that happens is that insert_bp_location actually wipes out the location's target_info data, which means that the breakpoint's shadow_contents info is lost. As a result, we are unable to restore the original instruction under our breakpoint after we hit it (before transfering control back to the user). This explains the SIGTRAP, because we have modified the code executed by the inferior by restoring a corrupted shadow_contents. The first part of the fix avoids the re-initialization of the whole target_info structure, avoid the wipe. . But that's not enough. Still while trying to re-insert the breakpoint, we land inside mem-break.c:default_memory_insert_breakpoint. This function does not seem to have enough information to determine whether the breakpoint is already inserted or not. So it just assumes that it hasn't and performs two actions: - Save the original code in the shadow_contents buffer; - Write the breakpoint instruction in its place. That's where we hit an unexpected limitation of target_read_memory. This funtion knows about breakpoints and uses the shadow_contents buffer if the read touches one of the areas where a breakpoint is still inserted. Unfortunately, the way the code is written, here is what happens: - memory_xfer_partial fetches the raw memory, potentially polluted by inserted breakpoints, saves the result in the target buffer; - memory_xfer_partial then calls breakpoint_xfer_memory to update the target buffer with pieces of shadow_contents as appropriate to remove the inserted breakpoints from the result. This approach breaks down when the buffer passed to these read functions is actually a shadow_contents buffer, because the first read ends up overwriting the shadow_contents, thus rendering the call to breakpoint_xfer_memory useless. One way to fix that problem was to lift that limitation, by using a temporary buffer inside memory_xfer_partial, and then copy the contents of that buffer into the target buffer after the two steps are complete. But Pedro felt it was too heavy a hammer. So instead, we fixed the two call sites where a target-read operation is performed with a breakpoint shadow_contents buffer as the argument. One side comment: * It looks like "b->loc = NULL" in update_breakpoint_locations is a memory leak, since the locations' ref counters are not decremented?