Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Luis Machado <luis.machado@arm.com>
To: Tom de Vries <tdevries@suse.de>, gdb-patches@sourceware.org
Subject: Re: [RFC] [gdb/testsuite] Add xfail in gdb.base/hbreak.exp
Date: Fri, 26 Jul 2024 16:46:48 +0100	[thread overview]
Message-ID: <8d4c3fa2-292b-499f-9f79-05407df4d7e5@arm.com> (raw)
In-Reply-To: <c51010fe-4e7d-40ee-8d2d-e494e211a8cf@suse.de>

On 7/26/24 11:01, Tom de Vries wrote:
> On 7/26/24 08:30, Luis Machado wrote:
>> On 7/26/24 07:28, Tom de Vries wrote:
>>> On 7/25/24 17:22, Luis Machado wrote:
>>>> On 7/25/24 14:52, Tom de Vries wrote:
>>>>> On 7/25/24 00:59, Luis Machado wrote:
>>>>>> On 7/24/24 12:56, Tom de Vries wrote:
>>>>>>> On 7/24/24 12:45, Luis Machado wrote:
>>>>>>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>>>>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>>>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>>>>>>
>>>>>>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>>>>>>> as often as the 32-bit code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>>>>>>
>>>>>>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Luis,
>>>>>>>>>>>
>>>>>>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>>>>>>
>>>>>>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>>>>>>
>>>>>>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>>>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, I spent some more time debugging this issue this morning.
>>>>>>>>>
>>>>>>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>>>>>>
>>>>>>>>
>>>>>>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>>>>>>
>>>>>>>
>>>>>>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
>>>>>>
>>>>>> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
>>>>>> 32-bit arm.
>>>>>>
>>>>>> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
>>>>>>
>>>>>> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
>>>>>>
>>>>>> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
>>>>>> we're done setting both the address and the control register.
>>>>>>
>>>>>
>>>>> I'm looking at that code, and it seems obvious to me that modify_user_hw_breakpoint is called both after setting the address register and after setting the control register.  Could you double-check your observation?
>>>>>
>>>>
>>>> You're right. It gets called twice, one for setting the address and the other for setting the control register. I missed that when reading through it.
>>>>
>>>> So it may still be the case this is also an issue with the 32-bit arm code. I'll have to boot a 32-bit kernel to check.
>>>>
>>>>>> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
>>>>>>
>>>>>> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
>>>>>>
>>>>>> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
>>>>>>
>>>>>> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
>>>>>> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
>>>>>>
>>>>>> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
>>>>>> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.
>>>>>
>>>>> Confirmed, that also works for me.
>>>>>
>>>>>> This is because inserting a hw break on an
>>>>>> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
>>>>>>
>>>>>> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
>>>>>>
>>>>>
>>>>> IIUC, your hypothesis is that the kernel bug is that the check for address vs breakpoint length should only happen when writing the control register?
>>>>
>>>> I think so, because we need to validate the address against the length of the breakpoint that is being requested. And that data is part of the control register.
>>>>
>>>> The problem seems to arise from the fact we need to do two ptrace calls to set things up, and we're trying to validate both calls.
>>>>
>>>>>
>>>>>> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
>>>>>> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
>>>>>> proceed to insert the hw break on the original requested unaligned thumb address.
>>>>>>
>>>>>> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
>>>>>> corner case might not be an issue.
>>>>>>
>>>>>
>>>>> I've submitted a patch implementing that approach ( https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), basically doing the following ptrace calls:
>>>>> ...
>>>>>            1. address_reg = bpts[i].address & ~0x7U
>>>>>            2. control_reg = bpts[i].control
>>>>>            3. address_reg = bpts[i].address
>>>>> ...
>>>>>
>>>>> [ Note that a fix for the kernel bug formulated above would mean that the address vs breakpoint length check in step 3 would stop happening, and we'd need to write the control register again in a step 4, to get that check back... ]
>>>>
>>>> That makes sense, as we need two ptrace calls for this operation
>>>>
>>>
>>> OK, I'll send a v2.
>>>
>>> I also think that it's probably a good idea to do the first control_reg write with the enabled bit switched off.
>>>
>>> That way we're not actually enabling a hw breakpoint on the wrong address.
>>
>> Makes sense to me.
>>
> 
> Well, it turns out that that doesn't work, so I've left that part as is.
> 
> V2 submitted here ( https://sourceware.org/pipermail/gdb-patches/2024-July/210709.html ).
> 
>> I'm in the process of booting a 32-bit kernel to check what is the situation on 32-bit arm.
>>
> 
> Great.  Unfortunately I don't have a setup where I can try that.

Ok. I've confirmed the problem also exists for 32-bit arm, so I also see the ptrace
error there.

      reply	other threads:[~2024-07-26 15:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-17 15:10 Tom de Vries
2024-07-17 15:14 ` Luis Machado
2024-07-23 10:02   ` Luis Machado
2024-07-24  5:25     ` Tom de Vries
2024-07-24  6:53       ` Luis Machado
2024-07-24  9:28         ` Tom de Vries
2024-07-24 10:45           ` Luis Machado
2024-07-24 11:56             ` Tom de Vries
2024-07-24 22:59               ` Luis Machado
2024-07-25 13:52                 ` Tom de Vries
2024-07-25 15:22                   ` Luis Machado
2024-07-26  6:28                     ` Tom de Vries
2024-07-26  6:30                       ` Luis Machado
2024-07-26 10:01                         ` Tom de Vries
2024-07-26 15:46                           ` Luis Machado [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d4c3fa2-292b-499f-9f79-05407df4d7e5@arm.com \
    --to=luis.machado@arm.com \
    --cc=gdb-patches@sourceware.org \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox