From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-bounces@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id XaxwFJEItF/iCQAAWB0awg
	(envelope-from <gdb-patches-bounces@sourceware.org>)
	for <public-inbox@simark.ca>; Tue, 17 Nov 2020 12:29:53 -0500
Received: by simark.ca (Postfix, from userid 112)
	id 4A20C1F08B; Tue, 17 Nov 2020 12:29:53 -0500 (EST)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca
X-Spam-Level: 
X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,MAILING_LIST_MULTI,RDNS_NONE,URIBL_BLOCKED autolearn=no
	autolearn_force=no version=3.4.2
Received: from sourceware.org (unknown [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id 981351E552
	for <public-inbox@simark.ca>; Tue, 17 Nov 2020 12:29:52 -0500 (EST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A33873896800;
	Tue, 17 Nov 2020 17:29:51 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A33873896800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1605634191;
	bh=wUevBVloeBallnaymKZ9mAUSkoM+0nsPxVZ1SKySn1Y=;
	h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=NH1QWHDEzd7wKXMeyzmvj9v2AdlAmtXEjTFp3938XeyOgeGyQ2V0BN3LgEf1Y0W12
	 UnJjHy1Ixto1neTlsAK7cjO/GZTXD5itRBLxLo0iDK926CP6SDlA6xNYmR+L+EcNfe
	 M/CGcqCxSDyY7Zscdq2WZ4afNrLLLL+2EytnO6xc=
Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com
 [IPv6:2607:f8b0:4864:20::744])
 by sourceware.org (Postfix) with ESMTPS id 0E7D2386F41D
 for <gdb-patches@sourceware.org>; Tue, 17 Nov 2020 17:29:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0E7D2386F41D
Received: by mail-qk1-x744.google.com with SMTP id d28so21204937qka.11
 for <gdb-patches@sourceware.org>; Tue, 17 Nov 2020 09:29:47 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:cc:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=wUevBVloeBallnaymKZ9mAUSkoM+0nsPxVZ1SKySn1Y=;
 b=M6+MUdUYubLgZPFFz5QHkPbXAB/tbC43vnnlXQSs7suGSXf4OkK0Snw0UgIP6+ZLt0
 y4OX8YyU5s6eSycCV76gxCUyQ2IULPGXbhxzndYhWjt1Dm/IZXBRkPTmKbfbxgf7Q/+n
 zsQPPeA0bzoN1czaO+dllITUpHGCLsepKEAvFd7s+dgUhATa2/Wpxz+NLwuIjt2C0Ci0
 43gYsm0Icx2nP1qgIToodxYlj7oMMyXuPxOVfgO5gGyOty6TWJedygXSODMHoD5ERwjj
 ci5aezCTOdMDkXvKxZaFJJJPjpcrHJFkNzLJSuqVIsE7HhbnhSWpuzWzTvBGoLwXl1vG
 lnQg==
X-Gm-Message-State: AOAM532R+Cy6A7JGNhO5ZJZnIIAJfE2z3QoPWgBXJg+8sew1PzfBlQv2
 7kBfIUn6bz3/ymlEjbNoNkRgJKu6SHN1ew==
X-Google-Smtp-Source: ABdhPJyYiHCQsjrZoPpUbaEh2zLhZqHeI9s/rWM+J/+0gvOhD9d0Lat9LHki0j/+2EnD6hSrwWbFOQ==
X-Received: by 2002:a37:9c52:: with SMTP id f79mr577695qke.425.1605634184299; 
 Tue, 17 Nov 2020 09:29:44 -0800 (PST)
Received: from ?IPv6:2804:7f0:8284:1487:b4f7:71cf:fcce:a2f7?
 ([2804:7f0:8284:1487:b4f7:71cf:fcce:a2f7])
 by smtp.gmail.com with ESMTPSA id h26sm15082012qkh.127.2020.11.17.09.29.41
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 17 Nov 2020 09:29:43 -0800 (PST)
Subject: Re: [PATCH v3 07/24] Documentation for memory tagging remote packets
To: David Spickett <david.spickett@linaro.org>
References: <20201109170435.15766-1-luis.machado@linaro.org>
 <20201109170435.15766-8-luis.machado@linaro.org> <83ft5i4fwl.fsf@gnu.org>
 <CACBW-2+t4DoOm95=MFp2-uV700HJArjvs_A4fREAeZPOus_D9g@mail.gmail.com>
 <CACBW-2+CLPKtzcuo3YB1L_7vcYrG-X-LHK3n6-M+Roj0iUGEag@mail.gmail.com>
 <08cb075e-018c-474e-abd4-b76ba1ed6a52@linaro.org>
 <CACBW-2+DrU-E2w-noTU+tUqfgmN-_0F2OqUumCWv+4pVt7PLdg@mail.gmail.com>
 <27e9cc8e-95e6-4ac3-bd05-0f3846c96916@linaro.org>
 <CACBW-2J1m9CUw=SrRigw=ahLrRgh1he-n1NQpbxgNxNizQHyzg@mail.gmail.com>
 <ccd88198-274a-7ec2-b156-ccf63f04b333@linaro.org>
 <CACBW-2+Mv-EVeq=N-JdLeoOdHr4qThK3FnqVvSXR4Ad+pzr8JQ@mail.gmail.com>
Message-ID: <7b5b77c2-1bc4-8810-93df-79343e67d1be@linaro.org>
Date: Tue, 17 Nov 2020 14:29:40 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <CACBW-2+Mv-EVeq=N-JdLeoOdHr4qThK3FnqVvSXR4Ad+pzr8JQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
From: Luis Machado via Gdb-patches <gdb-patches@sourceware.org>
Reply-To: Luis Machado <luis.machado@linaro.org>
Cc: gdb-patches@sourceware.org
Errors-To: gdb-patches-bounces@sourceware.org
Sender: "Gdb-patches" <gdb-patches-bounces@sourceware.org>

On 11/17/20 12:16 PM, David Spickett wrote:
>> Right now you can infer the technology and logical/allocation from the
>> type. There is no need for a two step process. I don't anticipate a need
>> for that in the future either, as long as we prevent the tag type ENUM
>> from having overlapping values.
> 
> Right, so to not overlap, a value of that tag type ENUM must be a
> combination of technology and tag...kind. (words are hard)
> E.g. MTE logical tag, <future tech> allocation tag , <other tech> strange tag

Why? It only needs to be a unique value that represents a unique 
combination. See below.

> 
> So if <future AArch64 tag technology> were to also have "logical" and
> "allocation" tags, they would be new type values, yes?
> 0 = MTE logical
> 1 = MTE allocation
> 2 = <future AArch64 tag technology> logical
> 3 = <future AArch64 tag technology> allocation
> 
> Because if they are still 0 and 1, they overlap and you can't tell MTE
> and <future AArch64 tag technology> apart.

I think there is some confusion here. Let me try to clarify.

Yes, we currently only have tag types 0 and 1 (AArch64's MTE and SPARC 
ADI technologies, logical/allocation kind). If we add more tag 
technologies, we will also add new code/commands that will pass the 
appropriate tag types down to the target/remote layer.

For example, if we end up supporting capability tags as a new type of 
tag (say, type 2, given we already have 0 and 1) , we will have to 
modify the code to handle that. It will be either a new command or new 
code to handle it.

The target/remote layer will only see unique numbers 0, 1 and 2.

For AArch64, these numbers would mean the following:

0 - MTE logical
1 - MTE allocation
2 - Capability

For SPARC ADI, the following:

0 - ADI logical
1 - ADI allocation
2 - Unknown/Invalid

> 
> I get that at the user interface layer you have some mapping back to a
> set of generic types, talking just about the
> packet contents.
> 
>> So this is wrong, as the remote/native side won't be able to tell both
>> definitions of type "1" apart.
> 
> Sorry that was a typo. Should be:
> 0 = mte logical, 1 = mte allocation, 1 = future logical, 2 = future
> allocation, 3 = future tag form (not logical/allocation) etc...
> (basically what I restated above)

I think the typo is still there.

You meant 0,1,2 3 and 4, right?

But I see what you mean. Even though the tag type name is "tag_logical" 
right now, the implementation choice is to interpret that as MTE for 
AArch64 and ADI for SPARC. Same thing for "tag_allocation".

The rationale for having these two generic tag types ("tag_logical" and 
"tag_allocation") is that this technology may be common to multiple 
architectures (AArch64 and SPARC right now).

Personally, I don't think having 4 type enums (mte_logical, 
mte_allocation, adi_logical and adi_allocation) would be better. That 
would require extra code for a similar technology. So, just to pass 
down, say, mte_logical instead of adi_logical, when those are in fact 
the same kind of tag/technology.

Does that make sense?

> 
>> If we want to address this and make it a bit more flexible, then we will
>> need to let architectures define their own tag type values. Then generic
>> code will have to go through an arch-specific hook to fetch the tag type.
> 
>> Would that address your concerns?
> 
> I think so. I thought of this working like the breakpoint types I
> referenced. Breakpoint type 1 for MIPS isn't in any way the same as
> type 1 for Arm etc.
> (ok they might be similar but the docs make it clear that the list of
> types is per architecture)
> 

I think the situation is similar here. I opted to group similar things 
together in the type names.

mte_allocation is similar to adi_allocation, but they're different in 
that mte_allocation has granule size 16, while adi_allocation has 
granule size 64.

mte_logical is exactly the same as adi_logical. They are both 4 bits in 
size.

>> As long as GDB sends a unique tag type identifier (non-overlapping
>> values), we will be able to tell them apart from the remote side.
> 
> Yes but depending on how those IDs are allocated we may only be able
> to tell logical vs allocation not MTE vs CHERI vs <future whatever>.
> 
> On Tue, 17 Nov 2020 at 14:44, Luis Machado <luis.machado@linaro.org> wrote:
>>
>> On 11/17/20 9:29 AM, David Spickett wrote:
>>>> Right. The type is really telling you what specific kind of tag you are
>>>> requesting, not the technology. So it may be perfectly valid to request
>>>> a MTE logical tag from the remote target, but the remote target doesn't
>>>> know how to reply to that at the moment (nor does it make much sense, IMO).
>>>>
>>>> The tag types don't overlap at the moment, given they are an ENUM in
>>>> generic code. So the server will tell them apart by their values.
>>>
>>> Ok but my concern here is that there are two aspects to type. (which I
>>> probably confused earlier tbf)
>>>
>>> 1. logical vs allocation
>>> 2. MTE vs <future tagging technology>
>>>
>>> So we have three scenarios:
>>> 1. Server uses type to decide between logical and allocation
>>
>> Right now 0 maps to logical and 1 maps to allocation for all
>> architectures. For AArch64 0 maps to MTE logical and 1 maps to MTE
>> allocation.
>>
>> But, if more tag types are supported in the future, we will need to map
>> types to logical, allocation or something else.
>>
>>> 2. Server uses type to decide between MTE and <future tagging technology>
>>
>> Right now 0 and 1 map to MTE for AArch64 and ADI for SPARC (not yet
>> supported). In the future, if there are more tagging technologies, we
>> will need to provide a mapping from tag types to tag technology.
>>
>>> 3. The combination of the two, should a system have both technologies
>>
>> In this case, we will use two mappings: type to tag type and type to tag
>> technology.
>>
>>>
>>> What I want to avoid is a two step process:
>>> 1. Somehow select tagging technology you want to interact with
>>> 2. You send memtag packets with the logical vs allocation "type"
>>>
>>> Which would be needed if the protocol "type" is just allocation vs
>>> logical. If the "type" includes the technology
>>> then we can do it in one step.
>>
>> Right now you can infer the technology and logical/allocation from the
>> type. There is no need for a two step process. I don't anticipate a need
>> for that in the future either, as long as we prevent the tag type ENUM
>> from having overlapping values.
>>
>>>
>>> So I prefer:
>>> 0 = mte logical, 1 = mte allocation, 1 = future logical, 2 = future
>>> allocation, 3 = future tag form (not logical/allocation) etc...
>>
>> So this is wrong, as the remote/native side won't be able to tell both
>> definitions of type "1" apart.
>>
>>> Over:
>>> 0 = logical, 1 = allocation, 2 = future term for other tag form etc...
>>
>> This is correct in my view. Everything is unique.
>>
>> The complicating factor here is that generic code/UI needs to use a
>> generic tag type so all architectures supporting memory tagging can use
>> the commands out of the box.
>>
>> For AArch64, tag_logical/tag_allocation means MTE logical/MTE
>> allocation. For SPARC ADI, this means whatever their tagging mechanism is.
>>
>> If we turn tag_logical into mte_logical, SPARC ADI won't be able to use
>> that. We will, again, need a mapping.
>>
>> If we want to address this and make it a bit more flexible, then we will
>> need to let architectures define their own tag type values. Then generic
>> code will have to go through an arch-specific hook to fetch the tag type.
>>
>> Would that address your concerns?
>>
>>>
>>> I see the need for the second form in gdb internally, but my concern
>>> is only with the protocol side.
>>
>> As long as GDB sends a unique tag type identifier (non-overlapping
>> values), we will be able to tell them apart from the remote side.
>>
>>>
>>>> I just want to keep that option open
>>>> if someone wants to do it, or if some other type of tag shows up that
>>>> would require such support.
>>>
>>> I hadn't thought of that. I agree that type should include that too.
>>>
>>> On Tue, 17 Nov 2020 at 12:01, Luis Machado <luis.machado@linaro.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 11/17/20 7:05 AM, David Spickett wrote:
>>>>>> Right now the design makes these types architecture-specific.
>>>>>
>>>>> This works too, in fact it matches the breakpoint types example better that way.
>>>>>
>>>>>> But there's one catch right now. The user-visible commands know about
>>>>>> two types of tags (logical and allocation). The native/remote side of
>>>>>> GDB only sees one type, the allocation one, as it doesn't make sense to
>>>>>> ask the native/remote target about logical tags.
>>>>>>
>>>>>> This is slightly messy and, in my opinion, should be an implementation
>>>>>> detail.
>>>>>
>>>>> Tell me if I have this right.
>>>>>
>>>>> In gdb in overall you have these two types but the server only uses
>>>>> one of them, the allocation tag type.
>>>>> So only the allocation tag type will ever go over the protocol. (for
>>>>> MTE at least)
>>>>
>>>> That's correct. Only GDB knows about logical tags. Those don't make
>>>> their way to the remote via the remote protocol.
>>>>
>>>>>
>>>>> Given that, if we assume that "mte allocation" type is 1. A future
>>>>> AArch64 kind of memory tagging could allocate 2 and on for its tag
>>>>> types.
>>>>> Something like:
>>>>> AArch64 Memory Tag types -
>>>>>      0 : MTE logical (which is internal only, reserved but documented as
>>>>> unused, or left out completely?)
>>>>>      1 : MTE allocation (the one we use at present)
>>>>>      2: <future tagging> logical tag (because maybe there is some server
>>>>> component for this kind of tagging extension?)
>>>>>      3: <future tagging> allocation tag
>>>>>
>>>>> The reason I want to clarify is that I understood the type to
>>>>> differentiate tagging technologies, not the kind of tag within them.
>>>>> (the type tells you MTE vs <future tag type> instead of allocation vs logical)
>>>>> The use case being what if you have MTE and <future tag type> active
>>>>> in the same target and I want to set an MTE allocation tag,
>>>>> how can the server tell them apart?
>>>>
>>>> Right. The type is really telling you what specific kind of tag you are
>>>> requesting, not the technology. So it may be perfectly valid to request
>>>> a MTE logical tag from the remote target, but the remote target doesn't
>>>> know how to reply to that at the moment (nor does it make much sense, IMO).
>>>>
>>>> The tag types don't overlap at the moment, given they are an ENUM in
>>>> generic code. So the server will tell them apart by their values.
>>>>
>>>>>
>>>>> If the type numbers overlap between tagging technologies, we can't
>>>>> tell them apart.
>>>>> However if they encode what extension they are for and the
>>>>> logical/allocation type (as in the example above) then we can.
>>>>>
>>>>> A lot of that is probably academic given there's one relevant type but
>>>>> we can at least document the intent of the field.
>>>>> E.g. "these types are global to AArch64 so new types should not
>>>>> overlap existing ones"
>>>>
>>>> I suppose. I chose to have generic ENUM's without specific references to
>>>> MTE for that reason. Architectures can use logical/allocation tags as
>>>> they see fit, but the ENUM values will not overlap.
>>>>
>>>> We need to support the UI as well, so there needs to be some generic
>>>> definitions so commands can query different tag types.
>>>>
>>>>>
>>>>>> Otherwise we'd need to standardize on particular tag type names across
>>>>>> different architectures, like "hw memory tag", "sw memory tag",
>>>>>> "capability tag" etc.
>>>>>
>>>>> Well I was thinking of type more as a single value like "mte". Anyway
>>>>> I'm fine with the integer route.
>>>>
>>>> Though we don't have a use for requesting logical tags from the remote
>>>> targets, it is possible to support that. Passing "mte" or any other
>>>> technology name would close that option.
>>>>
>>>> If, for example, we decide to have a dumb GDB client and a smart
>>>> GDBserver (unlikely at this point), then it would make sense to pass
>>>> down logical tag requests I think. I just want to keep that option open
>>>> if someone wants to do it, or if some other type of tag shows up that
>>>> would require such support.
>>>>
>>>>>
>>>>> On Mon, 16 Nov 2020 at 17:23, Luis Machado <luis.machado@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/16/20 1:04 PM, David Spickett wrote:
>>>>>>> Also with regard to the "type" field.
>>>>>>>
>>>>>>>> +@var{type} is the type of tag the request wants to fetch.  The typeis a signed
>>>>>>>> +integer.
>>>>>>>
>>>>>>> (typo aside) Is this field architecture specific and will there be a
>>>>>>> list of these type numbers documented anywhere? (or already is)
>>>>>>> For example would 1 on AArch64 be MTE, and on <other arch> be <other
>>>>>>> tag type>. Or would that <other tag type> be 2.
>>>>>>>
>>>>>>> My assumption has been that it is the latter and that a value means a
>>>>>>> kind of tagging extension. So for example 1=MTE rather than
>>>>>>> 1= mte logical and 2 = mte allocation. Correct me if I am wrong there.
>>>>>>
>>>>>> Right now the design makes these types architecture-specific. It would
>>>>>> be nice to have more documentation about them, for sure.
>>>>>>
>>>>>> But there's one catch right now. The user-visible commands know about
>>>>>> two types of tags (logical and allocation). The native/remote side of
>>>>>> GDB only sees one type, the allocation one, as it doesn't make sense to
>>>>>> ask the native/remote target about logical tags.
>>>>>>
>>>>>> This is slightly messy and, in my opinion, should be an implementation
>>>>>> detail.
>>>>>>
>>>>>> So, in summary... We have a couple generic tag types GDB knows about:
>>>>>> logical and allocation.
>>>>>>
>>>>>> Those types get translated to an arch/a target-specific type when they
>>>>>> cross the native/remote target boundary.
>>>>>>
>>>>>> In theory we could have generic tag types 1 and 2 in generic code, but
>>>>>> tag type 2 gets translated to type 1 in a remote packet.
>>>>>>
>>>>>> Maybe we could improve this a little.
>>>>>>
>>>>>>>
>>>>>>> A page like:
>>>>>>> https://sourceware.org/gdb/current/onlinedocs/gdb/ARM-Breakpoint-Kinds.html#ARM-Breakpoint-Kinds
>>>>>>>
>>>>>>> Or just a short note, given that there's only one type right now.
>>>>>>
>>>>>> Yes, that would be nice to expand for the tag types.
>>>>>>
>>>>>>>
>>>>>>> Also, I may have suggested the type be a string at some point. However
>>>>>>> based on examples like the link above
>>>>>>> I don't see much advantage to it apart from making packet dumps easier
>>>>>>> to read. Just wanted to close the loop on that
>>>>>>> if I didn't before.
>>>>>>
>>>>>> I don't have a strong preference here. I'm just forwarding the tag type
>>>>>> from generic code.
>>>>>>
>>>>>> If we want to pass strings, we will need a gdbarch hook that maps a type
>>>>>> to a string in the remote target layer.
>>>>>>
>>>>>> Otherwise we'd need to standardize on particular tag type names across
>>>>>> different architectures, like "hw memory tag", "sw memory tag",
>>>>>> "capability tag" etc.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 16 Nov 2020 at 15:44, David Spickett <david.spickett@linaro.org> wrote:
>>>>>>>>
>>>>>>>> Minor thing, there is a missing space here in "typeis".
>>>>>>>>
>>>>>>>>> +@var{type} is the type of tag the request wants to fetch.  The typeis a signed
>>>>>>>>> +integer.
>>>>>>>>
>>>>>>>> On Mon, 9 Nov 2020 at 17:08, Eli Zaretskii <eliz@gnu.org> wrote:
>>>>>>>>>
>>>>>>>>>> Date: Mon,  9 Nov 2020 14:04:18 -0300
>>>>>>>>>> From: Luis Machado via Gdb-patches <gdb-patches@sourceware.org>
>>>>>>>>>> Cc: david.spickett@linaro.org
>>>>>>>>>>
>>>>>>>>>> gdb/doc/ChangeLog:
>>>>>>>>>>
>>>>>>>>>> YYYY-MM-DD  Luis Machado  <luis.machado@linaro.org>
>>>>>>>>>>
>>>>>>>>>>           * gdb.texinfo (General Query Packets): Document qMemTags and
>>>>>>>>>>           QMemTags.  Document the "memory-tagging" feature.
>>>>>>>>>> ---
>>>>>>>>>>      gdb/doc/gdb.texinfo | 96 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>      1 file changed, 96 insertions(+)
>>>>>>>>>
>>>>>>>>> OK for this part, thanks.