Re: [RFC] [gdb/testsuite] Add xfail in gdb.base/hbreak.exp

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Tom de Vries <tdevries@suse.de>
To: Luis Machado <luis.machado@arm.com>, gdb-patches@sourceware.org
Subject: Re: [RFC] [gdb/testsuite] Add xfail in gdb.base/hbreak.exp
Date: Thu, 25 Jul 2024 15:52:07 +0200	[thread overview]
Message-ID: <fe1a3309-b01c-493e-80cd-3eb9fdccff6c@suse.de> (raw)
In-Reply-To: <01b531eb-5090-43c8-a9a0-11fc090e0291@arm.com>

On 7/25/24 00:59, Luis Machado wrote:
> On 7/24/24 12:56, Tom de Vries wrote:
>> On 7/24/24 12:45, Luis Machado wrote:
>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>> ...
>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>
>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>> as often as the 32-bit code.
>>>>>>>>
>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>
>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>
>>>>>>
>>>>>> Hi Luis,
>>>>>>
>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>
>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>
>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>
>>>>
>>>> Ok, I spent some more time debugging this issue this morning.
>>>>
>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>
>>>
>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>
>>
>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
> 
> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
> 32-bit arm.
> 
> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
> 
> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
> 
> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
> we're done setting both the address and the control register.
> 

I'm looking at that code, and it seems obvious to me that 
modify_user_hw_breakpoint is called both after setting the address 
register and after setting the control register.  Could you double-check 
your observation?

> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
> 
> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
> 
> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
> 
> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
> 
> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.

Confirmed, that also works for me.

> This is because inserting a hw break on an
> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
> 
> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
> 

IIUC, your hypothesis is that the kernel bug is that the check for 
address vs breakpoint length should only happen when writing the control 
register?

> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
> proceed to insert the hw break on the original requested unaligned thumb address.
> 
> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
> corner case might not be an issue.
> 

I've submitted a patch implementing that approach ( 
https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), 
basically doing the following ptrace calls:
...
	     1. address_reg = bpts[i].address & ~0x7U
	     2. control_reg = bpts[i].control
	     3. address_reg = bpts[i].address
...

[ Note that a fix for the kernel bug formulated above would mean that 
the address vs breakpoint length check in step 3 would stop happening, 
and we'd need to write the control register again in a step 4, to get 
that check back... ]

Thanks,
- Tom

> Thoughts?
> 
>>>> My theory at this point is that the following happens in the failing case:
>>>> - PTRACE_SETHBPREGS with address 0x4004e2
>>>> - compat_arch_ptrace
>>>> - compat_ptrace_sethbpregs
>>>> - compat_ptrace_hbp_set
>>>> - ptrace_hbp_set_addr
>>>> - ptrace_hbp_get_initialised_bp
>>>> - ptrace_hbp_create
>>>> - /* Initialise fields to sane defaults
>>>>        (i.e. values that will pass validation).  */
>>>>     attr.bp_len = HW_BREAKPOINT_LEN_4;
>>>
>>>
>>> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
>>>
>>
>> Agreed, you could frame that as a kernel bug.  It would be good to known whether the kernel developers agree with that assessment.
>>
>>>> - attr.bp_addr = 0x4004e2;
>>>> - modify_user_hw_breakpoint
>>>> - modify_user_hw_breakpoint_check
>>>> - hw_breakpoint_parse
>>>> - hw_breakpoint_arch_parse
>>>> - case is_compat_bp(bp)
>>>> - offset = 2;
>>>> - fallthrough to default
>>>> - return -EINVAL
>>>>
>>>> In short, we try to validate:
>>>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>>>> and fail.
>>>>
>>>> By reversing the order, we validate:
>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>>>> which both succeed.
>>>
>>> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
>>>
>>
>> Well, because we reversed the order of the two ptrace calls.
>>
>> So, in the original case, the first call to ptrace uses the default bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.
>>
>> And in the reversed order case, the first call to ptrace uses the default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).
>>
>> [ With "default" meaning, as set by ptrace_hbp_create, and "actual", as set by the ptrace calls. ]
>>
>>>>
>>>> So, my questions at this point are:
>>>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>>>     occur for native 32-bit arm?
>>>
>>> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
>>> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
>>>
>>
>> I also observed that they're very similar.
>>
>>>> - is this a kernel bug?
>>>
>>> Potentially, if it is assuming a length that is not correct.
>>>
>>>> - if this is a kernel bug, is there a workaround we can use?
>>>> - if this is not a kernel bug, is this because gdb is writing the
>>>>     Breakpoint Register Pair in the wrong order?
>>>
>>> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
>>> work around it.
>>>
>>
>> OK, I'm currently testing that approach.
>>
>> Thanks,
>> - Tom
>>
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>>>> continue^M
>>>>>>>>> Continuing.^M
>>>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>>>> ...
>>>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>>>> ...
>>>>>>>>>               if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>>>                   (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>>>                 perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>>>
>>>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>>>
>>>>>>>>> Tentatively mark this as xfail.
>>>>>>>>>
>>>>>>>>> Tested on aarch64-linux.
>>>>>>>>> ---
>>>>>>>>>      gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>>>      1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>>>        set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>>>      -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>>>> -     "hbreak"
>>>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>>>      -gdb_test "continue" \
>>>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>>>> +set addr 0x0
>>>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>>>> +    set addr $expect_out(1,string)
>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +set have_xfail 0
>>>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>>>> +    # breakpoint on.
>>>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +set re_xfail \
>>>>>>>>> +    [string_to_regexp \
>>>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>>>> +
>>>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>> +    }
>>>>>>>>> +    -re -wrap $re_xfail {
>>>>>>>>> +    if { $have_xfail } {
>>>>>>>>> +        xfail $gdb_test_name
>>>>>>>>> +    } else {
>>>>>>>>> +        fail $gdb_test_name
>>>>>>>>> +    }
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>>>
>>>>>>>
>>>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>>>
>>>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>>>
>>>>>
>>>>
>>>
>>
>

next prev parent reply	other threads:[~2024-07-25 13:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-17 15:10 Tom de Vries
2024-07-17 15:14 ` Luis Machado
2024-07-23 10:02   ` Luis Machado
2024-07-24  5:25     ` Tom de Vries
2024-07-24  6:53       ` Luis Machado
2024-07-24  9:28         ` Tom de Vries
2024-07-24 10:45           ` Luis Machado
2024-07-24 11:56             ` Tom de Vries
2024-07-24 22:59               ` Luis Machado
2024-07-25 13:52                 ` Tom de Vries [this message]
2024-07-25 15:22                   ` Luis Machado
2024-07-26  6:28                     ` Tom de Vries
2024-07-26  6:30                       ` Luis Machado
2024-07-26 10:01                         ` Tom de Vries
2024-07-26 15:46                           ` Luis Machado

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe1a3309-b01c-493e-80cd-3eb9fdccff6c@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    --cc=luis.machado@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox