Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
       [not found] ` <92aea037-9c0e-438f-8a0a-fd52dd2df7bc@suse.com>
@ 2025-08-25  8:45   ` Sam James
  2025-08-25  9:26     ` Alexander Monakov
  2025-08-26  8:19     ` Gerlicher, Klaus
  0 siblings, 2 replies; 9+ messages in thread
From: Sam James @ 2025-08-25  8:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jiang, Haochen, Binutils, gdb-patches, H.J. Lu, Alexander Monakov

Jan Beulich <jbeulich@suse.com> writes:

> On 25.08.2025 04:42, Jiang, Haochen wrote:
>> Hi all,
>> 
>> Recently I happened to have a look at the gdb code. At gdb/amd64-tdep.c
>> L1102 comment, it mentioned that:
>> 
>> /* WARNING: Keep onebyte_has_modrm, twobyte_has_modrm in sync with
>>    ../opcodes/i386-dis.c (until libopcodes exports them, or an alternative,
>>    at which point delete these in favor of libopcodes' versions).  */
>> 
>> This means the table content and usage should be the same as gas.
>> 
>> However, when we are using the table in disassembler at opcode/i386-dis.c
>> L9877, it is:
>> 
>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>>   if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
>>     {
>>       if (!ins.rex2)
>>         {
>>           ins.codep++;
>>           if (!fetch_code (info, ins.codep + 1))
>>             goto fetch_error_out;
>>         }
>> 
>>       dp = &dis386_twobyte[*ins.codep];
>>       ins.need_modrm = twobyte_has_modrm[*ins.codep];
>>     }
>>   else
>>     {
>>       dp = &dis386[*ins.codep];
>>       ins.need_modrm = onebyte_has_modrm[*ins.codep];
>>     }
>> 
>> It will use the very first byte of the bytecode.
>> 
>> On the other hand, in gdb, let's take VEX prefix as example at
>> gdb/amd64-tdep.c L1349, the logic is:
>> 
>>   /* Skip REX/VEX instruction encoding prefixes.  */
>>   ...
>>   else if (vex2_prefix_p (*insn))
>>     {
>>       details->enc_prefix_offset = insn - start;
>>       insn += 2;
>>     }
>>   else if (vex3_prefix_p (*insn))
>>     {
>>       details->enc_prefix_offset = insn - start;
>>       insn += 3;
>>     }
>>   ...
>>   if (prefix != nullptr && rex2_prefix_p (*prefix))
>>     {
>>       ...
>>     }
>>   else if (prefix != nullptr && vex2_prefix_p (*prefix))
>>     {
>>       need_modrm = twobyte_has_modrm[*insn];
>>       details->opcode_len = 2;
>>     }
>>   else if (prefix != nullptr && vex3_prefix_p (*prefix))
>>     {
>>       need_modrm = twobyte_has_modrm[*insn];
>>       ...
>> }
>> ...
>> 
>> It will skip the VEX prefix and use twobyte_has_modrm table instead of
>> onebyte_has_modrm[0xc4/c5] in disassembler. The table usage are totally
>> different although the table itself is the same. It will cause the need_modrm
>> value different eventually. For example, opcode for VPBLENDW under 128 bit
>> is "VEX.128.66.0F3A.WIG 0E /r ib". The need_modrm would be false in gdb
>> since twobyte_has_modrm[0x0e] is false.
>> 
>> Does anyone know the reason on that? It is weird to me.
>
> Same here; see https://sourceware.org/pipermail/gdb-patches/2019-February/155347.html.
> That patch might require re-basing and some work to be up-to-date again, but
> fundamentally it still looks applicable. I don't really understand why stuff
> like this isn't allowed in. Pedro's desire for a testcase is understandable,
> but shouldn't block such a patch (there was a 2nd one s well) for this many
> years.

I didn't realise a patch was rotting for this. There's Alexander's
PR28999 (and a few other either dupes or very-related bugs) too.

While I can understand wanting a testcase, tdep is really in a sorry
state for x86 anyway, and this clearly makes it better. Perhaps with
Haochen's interest, we can finally get it in. But I don't see any
specific x86 maintainers for gdb.

sam

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-25  8:45   ` Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb? Sam James
@ 2025-08-25  9:26     ` Alexander Monakov
  2025-08-25 14:33       ` Tom de Vries
  2025-08-26  8:19     ` Gerlicher, Klaus
  1 sibling, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2025-08-25  9:26 UTC (permalink / raw)
  To: Sam James
  Cc: Jan Beulich, Jiang, Haochen, Binutils, gdb-patches, H.J. Lu,
	Tom de Vries

Hi,

On Mon, 25 Aug 2025, Sam James wrote:

> Jan Beulich <jbeulich@suse.com> writes:
> 
> > On 25.08.2025 04:42, Jiang, Haochen wrote:
> >> Hi all,
> >> 
> >> Recently I happened to have a look at the gdb code. At gdb/amd64-tdep.c
> >> L1102 comment, it mentioned that:
> >> 
> >> /* WARNING: Keep onebyte_has_modrm, twobyte_has_modrm in sync with
> >>    ../opcodes/i386-dis.c (until libopcodes exports them, or an alternative,
> >>    at which point delete these in favor of libopcodes' versions).  */
> >> 
> >> This means the table content and usage should be the same as gas.
> >> 
> >> However, when we are using the table in disassembler at opcode/i386-dis.c
> >> L9877, it is:
> >> 
> >>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>   if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
> >>     {
> >>       if (!ins.rex2)
> >>         {
> >>           ins.codep++;
> >>           if (!fetch_code (info, ins.codep + 1))
> >>             goto fetch_error_out;
> >>         }
> >> 
> >>       dp = &dis386_twobyte[*ins.codep];
> >>       ins.need_modrm = twobyte_has_modrm[*ins.codep];
> >>     }
> >>   else
> >>     {
> >>       dp = &dis386[*ins.codep];
> >>       ins.need_modrm = onebyte_has_modrm[*ins.codep];
> >>     }
> >> 
> >> It will use the very first byte of the bytecode.
> >> 
> >> On the other hand, in gdb, let's take VEX prefix as example at
> >> gdb/amd64-tdep.c L1349, the logic is:
> >> 
> >>   /* Skip REX/VEX instruction encoding prefixes.  */
> >>   ...
> >>   else if (vex2_prefix_p (*insn))
> >>     {
> >>       details->enc_prefix_offset = insn - start;
> >>       insn += 2;
> >>     }
> >>   else if (vex3_prefix_p (*insn))
> >>     {
> >>       details->enc_prefix_offset = insn - start;
> >>       insn += 3;
> >>     }
> >>   ...
> >>   if (prefix != nullptr && rex2_prefix_p (*prefix))
> >>     {
> >>       ...
> >>     }
> >>   else if (prefix != nullptr && vex2_prefix_p (*prefix))
> >>     {
> >>       need_modrm = twobyte_has_modrm[*insn];
> >>       details->opcode_len = 2;
> >>     }
> >>   else if (prefix != nullptr && vex3_prefix_p (*prefix))
> >>     {
> >>       need_modrm = twobyte_has_modrm[*insn];
> >>       ...
> >> }
> >> ...
> >> 
> >> It will skip the VEX prefix and use twobyte_has_modrm table instead of
> >> onebyte_has_modrm[0xc4/c5] in disassembler. The table usage are totally
> >> different although the table itself is the same. It will cause the need_modrm
> >> value different eventually. For example, opcode for VPBLENDW under 128 bit
> >> is "VEX.128.66.0F3A.WIG 0E /r ib". The need_modrm would be false in gdb
> >> since twobyte_has_modrm[0x0e] is false.
> >> 
> >> Does anyone know the reason on that? It is weird to me.
> >
> > Same here; see https://sourceware.org/pipermail/gdb-patches/2019-February/155347.html.
> > That patch might require re-basing and some work to be up-to-date again, but
> > fundamentally it still looks applicable. I don't really understand why stuff
> > like this isn't allowed in. Pedro's desire for a testcase is understandable,
> > but shouldn't block such a patch (there was a 2nd one s well) for this many
> > years.
> 
> I didn't realise a patch was rotting for this. There's Alexander's
> PR28999 (and a few other either dupes or very-related bugs) too.
> 
> While I can understand wanting a testcase, tdep is really in a sorry
> state for x86 anyway, and this clearly makes it better. Perhaps with
> Haochen's interest, we can finally get it in. But I don't see any
> specific x86 maintainers for gdb.

I hope the bug is fixed by a more comprehensive patchset from Tom, which has
already landed:
https://inbox.sourceware.org/gdb-patches/e5282a4b-5d9f-4891-b9b8-45ded54ec6ee@suse.de/

Haochen's question still stands, I guess.

Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-25  9:26     ` Alexander Monakov
@ 2025-08-25 14:33       ` Tom de Vries
  2025-08-26  6:02         ` Jiang, Haochen
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2025-08-25 14:33 UTC (permalink / raw)
  To: Alexander Monakov, Sam James
  Cc: Jan Beulich, Jiang, Haochen, Binutils, gdb-patches, H.J. Lu

On 8/25/25 11:26, Alexander Monakov wrote:
> Hi,
> 
> On Mon, 25 Aug 2025, Sam James wrote:
> 
>> Jan Beulich <jbeulich@suse.com> writes:
>>
>>> On 25.08.2025 04:42, Jiang, Haochen wrote:
>>>> Hi all,
>>>>
>>>> Recently I happened to have a look at the gdb code. At gdb/amd64-tdep.c
>>>> L1102 comment, it mentioned that:
>>>>
>>>> /* WARNING: Keep onebyte_has_modrm, twobyte_has_modrm in sync with
>>>>     ../opcodes/i386-dis.c (until libopcodes exports them, or an alternative,
>>>>     at which point delete these in favor of libopcodes' versions).  */
>>>>
>>>> This means the table content and usage should be the same as gas.
>>>>
>>>> However, when we are using the table in disassembler at opcode/i386-dis.c
>>>> L9877, it is:
>>>>
>>>>    /* REX2.M in rex2 prefix represents map0 or map1.  */
>>>>    if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
>>>>      {
>>>>        if (!ins.rex2)
>>>>          {
>>>>            ins.codep++;
>>>>            if (!fetch_code (info, ins.codep + 1))
>>>>              goto fetch_error_out;
>>>>          }
>>>>
>>>>        dp = &dis386_twobyte[*ins.codep];
>>>>        ins.need_modrm = twobyte_has_modrm[*ins.codep];
>>>>      }
>>>>    else
>>>>      {
>>>>        dp = &dis386[*ins.codep];
>>>>        ins.need_modrm = onebyte_has_modrm[*ins.codep];
>>>>      }
>>>>
>>>> It will use the very first byte of the bytecode.
>>>>
>>>> On the other hand, in gdb, let's take VEX prefix as example at
>>>> gdb/amd64-tdep.c L1349, the logic is:
>>>>
>>>>    /* Skip REX/VEX instruction encoding prefixes.  */
>>>>    ...
>>>>    else if (vex2_prefix_p (*insn))
>>>>      {
>>>>        details->enc_prefix_offset = insn - start;
>>>>        insn += 2;
>>>>      }
>>>>    else if (vex3_prefix_p (*insn))
>>>>      {
>>>>        details->enc_prefix_offset = insn - start;
>>>>        insn += 3;
>>>>      }
>>>>    ...
>>>>    if (prefix != nullptr && rex2_prefix_p (*prefix))
>>>>      {
>>>>        ...
>>>>      }
>>>>    else if (prefix != nullptr && vex2_prefix_p (*prefix))
>>>>      {
>>>>        need_modrm = twobyte_has_modrm[*insn];
>>>>        details->opcode_len = 2;
>>>>      }
>>>>    else if (prefix != nullptr && vex3_prefix_p (*prefix))
>>>>      {
>>>>        need_modrm = twobyte_has_modrm[*insn];
>>>>        ...
>>>> }
>>>> ...
>>>>
>>>> It will skip the VEX prefix and use twobyte_has_modrm table instead of
>>>> onebyte_has_modrm[0xc4/c5] in disassembler. The table usage are totally
>>>> different although the table itself is the same. It will cause the need_modrm
>>>> value different eventually. For example, opcode for VPBLENDW under 128 bit
>>>> is "VEX.128.66.0F3A.WIG 0E /r ib". The need_modrm would be false in gdb
>>>> since twobyte_has_modrm[0x0e] is false.
>>>>
>>>> Does anyone know the reason on that? It is weird to me.
>>>
>>> Same here; see https://sourceware.org/pipermail/gdb-patches/2019-February/155347.html.
>>> That patch might require re-basing and some work to be up-to-date again, but
>>> fundamentally it still looks applicable. I don't really understand why stuff
>>> like this isn't allowed in. Pedro's desire for a testcase is understandable,
>>> but shouldn't block such a patch (there was a 2nd one s well) for this many
>>> years.
>>
>> I didn't realise a patch was rotting for this. There's Alexander's
>> PR28999 (and a few other either dupes or very-related bugs) too.
>>
>> While I can understand wanting a testcase, tdep is really in a sorry
>> state for x86 anyway, and this clearly makes it better. Perhaps with
>> Haochen's interest, we can finally get it in. But I don't see any
>> specific x86 maintainers for gdb.
> 
> I hope the bug is fixed by a more comprehensive patchset from Tom, which has
> already landed:
> https://inbox.sourceware.org/gdb-patches/e5282a4b-5d9f-4891-b9b8-45ded54ec6ee@suse.de/
> 
> Haochen's question still stands, I guess.

Hi Alexander,

thanks for cc-ing me on this thread.

Indeed the PR you filed has been fixed.

I grepped a bit in the gas testsuite for VPBLENDW and constructed a gdb 
unit test that passes with workaround but fails without:
...
diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
index d5ea4aff4cf..251e6932c64 100644
--- a/gdb/amd64-tdep.c
+++ b/gdb/amd64-tdep.c
@@ -3755,6 +3755,32 @@ test_amd64_get_insn_details (void)
      = { 0x62, 0xf1, 0x7c, 0x48, 0x28, 0x81, 0x00, 0xfc, 0xff, 0xff };
    fixup_riprel (details, insn.data (), ECX_REG_NUM);
    SELF_CHECK (insn == updated_insn);
+
+  /* INSN: vpblendw $0x7,%xmm4,%xmm6,%xmm2, vex3 prefix.  */
+  insn = { 0xc4, 0xe3, 0x49, 0x0e, 0xd4, 0x07 };
+  amd64_get_insn_details (insn.data (), &details);
+  SELF_CHECK (details.opcode_len == 3);
+  SELF_CHECK (details.enc_prefix_offset == 0);
+  SELF_CHECK (details.opcode_offset == 3);
+  SELF_CHECK (details.modrm_offset == -1);
+
+  /* INSN: vpblendw $0x7,0xff(%rip),%ymm6,%ymm2.  */
+  insn = { 0xc4, 0xe3, 0x4d, 0x0e, 0x15, 0xff, 0x00, 0x00, 0x00, 0x07 };
+  amd64_get_insn_details (insn.data (), &details);
+
+  if (1 && details.modrm_offset == -1)
+    {
+      /* Workaround.  */
+      details.modrm_offset = 4;
+    }
+
+  SELF_CHECK (details.modrm_offset != -1);
+
+  /* INSN: vpblendw $0x7,0xff(%ecx),%ymm6,%ymm2.  */
+  fixup_riprel (details, insn.data (), ECX_REG_NUM);
+  updated_insn
+    = { 0xc4, 0xe3, 0x4d, 0x0e, 0x91, 0xff, 0x00, 0x00, 0x00, 0x07 };
+  SELF_CHECK (insn == updated_insn);
  }

  static void
...
because of an incorrect modrm_offset value.

I'll try to fix this.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-25 14:33       ` Tom de Vries
@ 2025-08-26  6:02         ` Jiang, Haochen
  0 siblings, 0 replies; 9+ messages in thread
From: Jiang, Haochen @ 2025-08-26  6:02 UTC (permalink / raw)
  To: Tom de Vries, Alexander Monakov, Sam James
  Cc: Beulich, Jan, Binutils, gdb-patches, H.J. Lu

> From: Tom de Vries <tdevries@suse.de>
> Sent: Monday, August 25, 2025 10:34 PM
> 
> On 8/25/25 11:26, Alexander Monakov wrote:
> >
> > On Mon, 25 Aug 2025, Sam James wrote:
> >
> >> Jan Beulich <jbeulich@suse.com> writes:
> >>
> >>> On 25.08.2025 04:42, Jiang, Haochen wrote:
> >>>> Does anyone know the reason on that? It is weird to me.
> >>>
> >>> Same here; see https://sourceware.org/pipermail/gdb-patches/2019-
> February/155347.html.
> >>> That patch might require re-basing and some work to be up-to-date again,
> >>> but fundamentally it still looks applicable. I don't really understand why stuff
> >>> like this isn't allowed in. Pedro's desire for a testcase is understandable,
> >>> but shouldn't block such a patch (there was a 2nd one s well) for this
> >>> many years.
> >>
> >> I didn't realise a patch was rotting for this. There's Alexander's
> >> PR28999 (and a few other either dupes or very-related bugs) too.
> >>
> >> While I can understand wanting a testcase, tdep is really in a sorry
> >> state for x86 anyway, and this clearly makes it better. Perhaps with
> >> Haochen's interest, we can finally get it in. But I don't see any
> >> specific x86 maintainers for gdb.
> >
> > I hope the bug is fixed by a more comprehensive patchset from Tom, which
> > has already landed:
> > https://inbox.sourceware.org/gdb-patches/e5282a4b-5d9f-4891-b9b8-
> 45ded54ec6ee@suse.de/
> >
> > Haochen's question still stands, I guess.

I happened to have a look at gdb code since someone encountered similar
rip relative address issue and asking me why. And we finally found that the
problem is on the usage of that table.

I still believe we need to do something in the table handling to eliminate the
gap between disassembler and gdb since it is said to be the table is the same,
indicating the usage should also be the same.

Jan's patch in 2019 is doing that (while we need to work around VZEROALL and
VZEROUPPER). And I knew that a testcase to cover everything is tricky since ...

> 
> I grepped a bit in the gas testsuite for VPBLENDW and constructed a gdb
> unit test that passes with workaround but fails without:

... I suppose VPBLENDW might not be the only inst meeting the issue.

We need to either work on the similar way Jan's patch does and polish them
or totally separate the usage of the table if gdb would like to still keep the
current logic (which I do not prefer since it is make things over complicated).

Thx,
Haochen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-25  8:45   ` Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb? Sam James
  2025-08-25  9:26     ` Alexander Monakov
@ 2025-08-26  8:19     ` Gerlicher, Klaus
  2025-08-26  9:31       ` Tom de Vries
  1 sibling, 1 reply; 9+ messages in thread
From: Gerlicher, Klaus @ 2025-08-26  8:19 UTC (permalink / raw)
  To: Sam James, Beulich, Jan, Tom de Vries
  Cc: Jiang, Haochen, Binutils, gdb-patches, H.J.Lu, Alexander Monakov

> -----Original Message-----
> From: Sam James <sam@gentoo.org>
> Sent: Monday, August 25, 2025 10:46 AM
> To: Beulich, Jan <JBeulich@suse.com>
> Cc: Jiang, Haochen <haochen.jiang@intel.com>; Binutils
> <binutils@sourceware.org>; gdb-patches@sourceware.org; H.J.Lu
> <hjl.tools@gmail.com>; Alexander Monakov <amonakov@ispras.ru>
> Subject: Re: Inconsistent usage on onebyte_modrm and twobyte_modrm
> table in x86 disassembler and gdb?
> 
> Jan Beulich <jbeulich@suse.com> writes:
> 
> > On 25.08.2025 04:42, Jiang, Haochen wrote:
> >> Hi all,
> >>
> >> Recently I happened to have a look at the gdb code. At gdb/amd64-tdep.c
> >> L1102 comment, it mentioned that:
> >>
> >> /* WARNING: Keep onebyte_has_modrm, twobyte_has_modrm in sync
> with
> >>    ../opcodes/i386-dis.c (until libopcodes exports them, or an alternative,
> >>    at which point delete these in favor of libopcodes' versions).  */
> >>
> >> This means the table content and usage should be the same as gas.
> >>
> >> However, when we are using the table in disassembler at opcode/i386-dis.c
> >> L9877, it is:
> >>
> >>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>   if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
> >>     {
> >>       if (!ins.rex2)
> >>         {
> >>           ins.codep++;
> >>           if (!fetch_code (info, ins.codep + 1))
> >>             goto fetch_error_out;
> >>         }
> >>
> >>       dp = &dis386_twobyte[*ins.codep];
> >>       ins.need_modrm = twobyte_has_modrm[*ins.codep];
> >>     }
> >>   else
> >>     {
> >>       dp = &dis386[*ins.codep];
> >>       ins.need_modrm = onebyte_has_modrm[*ins.codep];
> >>     }
> >>
> >> It will use the very first byte of the bytecode.
> >>
> >> On the other hand, in gdb, let's take VEX prefix as example at
> >> gdb/amd64-tdep.c L1349, the logic is:
> >>
> >>   /* Skip REX/VEX instruction encoding prefixes.  */
> >>   ...
> >>   else if (vex2_prefix_p (*insn))
> >>     {
> >>       details->enc_prefix_offset = insn - start;
> >>       insn += 2;
> >>     }
> >>   else if (vex3_prefix_p (*insn))
> >>     {
> >>       details->enc_prefix_offset = insn - start;
> >>       insn += 3;
> >>     }
> >>   ...
> >>   if (prefix != nullptr && rex2_prefix_p (*prefix))
> >>     {
> >>       ...
> >>     }
> >>   else if (prefix != nullptr && vex2_prefix_p (*prefix))
> >>     {
> >>       need_modrm = twobyte_has_modrm[*insn];
> >>       details->opcode_len = 2;
> >>     }
> >>   else if (prefix != nullptr && vex3_prefix_p (*prefix))
> >>     {
> >>       need_modrm = twobyte_has_modrm[*insn];
> >>       ...
> >> }
> >> ...
> >>
> >> It will skip the VEX prefix and use twobyte_has_modrm table instead of
> >> onebyte_has_modrm[0xc4/c5] in disassembler. The table usage are totally
> >> different although the table itself is the same. It will cause the need_modrm
> >> value different eventually. For example, opcode for VPBLENDW under 128
> bit
> >> is "VEX.128.66.0F3A.WIG 0E /r ib". The need_modrm would be false in gdb
> >> since twobyte_has_modrm[0x0e] is false.
> >>
> >> Does anyone know the reason on that? It is weird to me.
> >
> > Same here; see https://sourceware.org/pipermail/gdb-patches/2019-
> February/155347.html.
> > That patch might require re-basing and some work to be up-to-date again,
> but
> > fundamentally it still looks applicable. I don't really understand why stuff
> > like this isn't allowed in. Pedro's desire for a testcase is understandable,
> > but shouldn't block such a patch (there was a 2nd one s well) for this many
> > years.
> 
> I didn't realise a patch was rotting for this. There's Alexander's
> PR28999 (and a few other either dupes or very-related bugs) too.
> 
> While I can understand wanting a testcase, tdep is really in a sorry
> state for x86 anyway, and this clearly makes it better. Perhaps with
> Haochen's interest, we can finally get it in. But I don't see any
> specific x86 maintainers for gdb.
> 
> sam

Hi all,

FWIW I'll attach my fix for the (V)PBLENDW. It has a testcase and a unittest case. 

I believe with this there are no other instructions that would not be handled correctly.

VEX and EVEX-encoded instructions generally require a ModR/M byte, with the
notable exception of vzeroall and vzeroupper (opcode 0x77), which do not
use ModR/M.

This change sets need_modrm = 1 for VEX instructions, and adds an exception
for instructions where *insn == 0x77, following Intel's SDM.

EVEX has no exceptions and thus always sets need_modrm to 1.

Additionally, the legacy twobyte_has_modrm table cannot be used for VEX and
EVEX instructions, as these encodings have different requirements and
exceptions. The logic is now explicit for VEX/EVEX handling.

Add vpblendw to selftest amd64_insn_decode. Note there's no EVEX testcase
added here even though decoding logic for EVEX has also changed.

The Intel SDM says the following:

  1. Intel(r) 64 and IA-32 Architectures Software Developer's Manual

  Section 2.2.1.2 - Instruction Prefixes

  "The VEX prefix is a multi-byte prefix that replaces several legacy prefixes
  and opcode bytes. The VEX prefix is not an opcode; it is a prefix that
  modifies the instruction that follows."

  Section 2.2.1.3 - Opcode Bytes

  "The opcode byte(s) follow any instruction prefixes (including VEX). The
  opcode specifies the operation to be performed."

  Section 2.2.2 - Instruction Format

  "If a VEX prefix is present, it is processed as a single prefix, and the
  opcode bytes follow immediately after the VEX prefix."

  Source: Intel(r) SDM Vol. 2A, Section 2.2.1.2 and 2.2.2 (See Vol. 2A,
  PDF pages 2-4, 2-5, and 2-7)

  2. ModRM Byte Requirement

  Intel(r) SDM Vol. 2A, Table 2-2 - VEX Prefix Encoding

  "Most VEX-encoded instructions require a ModRM byte, except for a few
  instructions such as VZEROALL and VZEROUPPER."

  Source: Intel(r) SDM Vol. 2A, Table 2-2 (See Vol. 2A, PDF page 2-13)
---
 gdb/amd64-tdep.c                              | 24 ++++++++++++++-----
 gdb/testsuite/gdb.arch/amd64-disp-step-avx.S  | 19 +++++++++++++++
 .../gdb.arch/amd64-disp-step-avx.exp          | 20 ++++++++++++++++
 3 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
index 1adf81602b0..a273cb9594c 100755
--- a/gdb/amd64-tdep.c
+++ b/gdb/amd64-tdep.c
@@ -1686,13 +1686,12 @@ amd64_get_insn_details (gdb_byte *insn, struct amd64_insn *details)
     }
   else if (prefix != nullptr && vex2_prefix_p (*prefix))
     {
-      need_modrm = twobyte_has_modrm[*insn];
+      /* All VEX2 instructions need ModR/M, except vzeroupper/vzeroall.  */
+      need_modrm = *insn != 0x77 ? 1 : 0;
       details->opcode_len = 2;
     }
   else if (prefix != nullptr && vex3_prefix_p (*prefix))
     {
-      need_modrm = twobyte_has_modrm[*insn];
-
       gdb_byte m = prefix[1] & 0x1f;
       if (m == 0)
 	{
@@ -1701,12 +1700,16 @@ amd64_get_insn_details (gdb_byte *insn, struct amd64_insn *details)
 	}
       else if (m == 1)
 	{
-	  /* Escape 0x0f.  */
+	  /* Escape 0x0f.  All VEX3 instructions in this map need ModR/M,
+	     except vzeroupper/vzeroall.  */
+	  need_modrm = *insn != 0x77 ? 1 : 0;
 	  details->opcode_len = 2;
 	}
       else if (m == 2 || m == 3)
 	{
-	  /* Escape 0x0f 0x38 or 0x0f 0x3a.  */
+	  /* Escape 0x0f 0x38 or 0x0f 0x3a.  All VEX3 instructions in
+	     this map need ModR/M.  */
+      	  need_modrm = 1;
 	  details->opcode_len = 3;
 	}
       else if (m == 7)
@@ -1722,7 +1725,8 @@ amd64_get_insn_details (gdb_byte *insn, struct amd64_insn *details)
     }
   else if (prefix != nullptr && evex_prefix_p (*prefix))
     {
-      need_modrm = twobyte_has_modrm[*insn];
+      /* All EVEX instructions need ModR/M.  */
+      need_modrm = 1;
 
       gdb_byte m = prefix[1] & 0x7;
       if (m == 1)
@@ -4254,6 +4258,14 @@ test_amd64_get_insn_details (void)
     = { 0x62, 0xf1, 0x7c, 0x48, 0x28, 0x81, 0x00, 0xfc, 0xff, 0xff };
   fixup_riprel (details, insn.data (), ECX_REG_NUM);
   SELF_CHECK (insn == updated_insn);
+
+  /* INSN: vpblendw -0x400(%rip),%zmm0, evex prefix.  */
+  insn = { 0xc4, 0xe3, 0x45, 0x0e, 0x25, 0x3a, 0x0f, 0x00, 0x00, 0xff };
+  amd64_get_insn_details (insn.data (), &details);
+  SELF_CHECK (details.opcode_len == 3);
+  SELF_CHECK (details.enc_prefix_offset == 0);
+  SELF_CHECK (details.opcode_offset == 3);
+  SELF_CHECK (details.modrm_offset == 4);
 }
 
 static void
diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step-avx.S b/gdb/testsuite/gdb.arch/amd64-disp-step-avx.S
index 9b6b2b1f3ed..d83eb14f777 100644
--- a/gdb/testsuite/gdb.arch/amd64-disp-step-avx.S
+++ b/gdb/testsuite/gdb.arch/amd64-disp-step-avx.S
@@ -43,6 +43,18 @@ test_rip_vex3:
 test_rip_vex3_end:
 	nop
 
+
+/* Test a VEX3-encoded RIP-relative VBLENDW instruction.  */
+	vmovsd ro_var_vpblendw(%rip),%xmm0
+
+	.global test_rip_vex3_vblendw
+test_rip_vex3_vblendw:
+	vpblendw  $0x03, var128_vblendw(%rip), %ymm0, %ymm1
+	.global test_rip_vex3_vblendw
+test_rip_vex3_vblendw_end:
+	vmovdqu %xmm1, var128_vblendw(%rip)
+	nop
+
 	/* skip over test data */
 	jmp done
 
@@ -52,6 +64,10 @@ ro_var:
 	.8byte 0x1122334455667788
 	.8byte 0x8877665544332211
 
+ro_var_vpblendw:
+	.8byte 0x1122334455667788
+	.8byte 0x8877665544332211
+
 /***********************************************/
 
 /* All done.  */
@@ -67,4 +83,7 @@ done:
 var128:
 	.8byte 0xaa55aa55aa55aa55
 	.8byte 0x55aa55aa55aa55aa
+var128_vblendw:
+	.8byte 0xaa55aa55aa55aa55
+	.8byte 0x55aa55aa55aa55aa
 	.section	.note.GNU-stack,"",@progbits
diff --git a/gdb/testsuite/gdb.arch/amd64-disp-step-avx.exp b/gdb/testsuite/gdb.arch/amd64-disp-step-avx.exp
index 4ae6461fb0d..4bd9f93d8fa 100644
--- a/gdb/testsuite/gdb.arch/amd64-disp-step-avx.exp
+++ b/gdb/testsuite/gdb.arch/amd64-disp-step-avx.exp
@@ -140,5 +140,25 @@ with_test_prefix "vex3" {
 	"var128 has expected value after"
 }
 
+# Test a VEX3-encoded RIP-relative instruction that was falsely
+# identified as not having a ModRM byte.
+with_test_prefix "vex3_vblendw" {
+    # This case writes to the 'var128' variable.  Confirm the
+    # variable's value is what we believe it is before the AVX
+    # instruction runs.
+    gdb_test "p /x (unsigned long long \[2\]) var128_vblendw" \
+	" = \\{0xaa55aa55aa55aa55, 0x55aa55aa55aa55aa\\}" \
+	"var128_vblendw has expected value before"
+
+    # Run the AVX instruction.
+    disp_step_func "test_rip_vex3_vblendw"
+
+    # Confirm the that the store of the vpblendw result changed
+    # ymm1 register
+    gdb_test "p /x \$xmm1.uint128" \
+	" = 0x11223344aa55aa55" \
+	"ymm1 has expected value after"
+}
+
 # Done, run program to exit.
 gdb_continue_to_end "amd64-disp-step-avx"
--


Thanks
Klaus
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-26  8:19     ` Gerlicher, Klaus
@ 2025-08-26  9:31       ` Tom de Vries
  2025-08-27 10:32         ` Schimpe, Christina
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2025-08-26  9:31 UTC (permalink / raw)
  To: Gerlicher, Klaus, Sam James, Beulich, Jan
  Cc: Jiang, Haochen, Binutils, gdb-patches, H.J.Lu, Alexander Monakov

On 8/26/25 10:19, Gerlicher, Klaus wrote:
> FWIW I'll attach my fix for the (V)PBLENDW. It has a testcase and a unittest case.

Hi Klaus,

thanks for working on this.

The patch LGTM, other than this:
...
$ git show --pretty=%s --check
Fix vpblendw

gdb/amd64-tdep.c:1409: space before tab in indent.
+      	  need_modrm = 1;
...

I've tested both the unit test, and the updated test-case.

I've also tested the unit test I wrote, which uses vex3 instead of evex:
...
+
+  /* INSN: vpblendw $0x7,%xmm4,%xmm6,%xmm2, vex3 prefix.  */
+  insn = { 0xc4, 0xe3, 0x49, 0x0e, 0xd4, 0x07 };
+  amd64_get_insn_details (insn.data (), &details);
+  SELF_CHECK (details.opcode_len == 3);
+  SELF_CHECK (details.enc_prefix_offset == 0);
+  SELF_CHECK (details.opcode_offset == 3);
+  SELF_CHECK (details.modrm_offset == 4);
+
+  /* INSN: vpblendw $0x7,0xff(%rip),%ymm6,%ymm2, vex3 prefix.  */
+  insn = { 0xc4, 0xe3, 0x4d, 0x0e, 0x15, 0xff, 0x00, 0x00, 0x00, 0x07 };
+  amd64_get_insn_details (insn.data (), &details);
+  SELF_CHECK (details.opcode_len == 3);
+  SELF_CHECK (details.enc_prefix_offset == 0);
+  SELF_CHECK (details.opcode_offset == 3);
+  SELF_CHECK (details.modrm_offset == 4);
+
+  /* INSN: vpblendw $0x7,0xff(%ecx),%ymm6,%ymm2, vex3 prefix.  */
+  fixup_riprel (details, insn.data (), ECX_REG_NUM);
+  updated_insn
+    = { 0xc4, 0xe3, 0x4d, 0x0e, 0x91, 0xff, 0x00, 0x00, 0x00, 0x07 };
+  SELF_CHECK (insn == updated_insn);
...

You could add this as well, but it's not required.

[ Note that you could drop the test-case update and use fixup_riprel on 
the unit test you added instead, like I did here and is done elsewhere 
in the unit test. ]

Please commit this with some appropriate $subject.

Approved-By: Tom de Vries <tdevries@suse.de>

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-26  9:31       ` Tom de Vries
@ 2025-08-27 10:32         ` Schimpe, Christina
  2025-08-27 12:20           ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Schimpe, Christina @ 2025-08-27 10:32 UTC (permalink / raw)
  To: Tom de Vries, Gerlicher, Klaus, Sam James, Beulich, Jan
  Cc: Jiang, Haochen, Binutils, gdb-patches, H.J.Lu, Alexander Monakov

Hi all, 

Thanks a lot for working on this. I have followed this thread and also discussed this with Klaus a bit.
I still wonder, would it make sense to align the logic in x86 disassembler and gdb ?

Kind Regards,
Christina

> -----Original Message-----
> From: Tom de Vries <tdevries@suse.de>
> Sent: Tuesday, August 26, 2025 11:32 AM
> To: Gerlicher, Klaus <klaus.gerlicher@intel.com>; Sam James <sam@gentoo.org>;
> Beulich, Jan <JBeulich@suse.com>
> Cc: Jiang, Haochen <haochen.jiang@intel.com>; Binutils
> <binutils@sourceware.org>; gdb-patches@sourceware.org; H.J.Lu
> <hjl.tools@gmail.com>; Alexander Monakov <amonakov@ispras.ru>
> Subject: Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table
> in x86 disassembler and gdb?
> 
> On 8/26/25 10:19, Gerlicher, Klaus wrote:
> > FWIW I'll attach my fix for the (V)PBLENDW. It has a testcase and a unittest
> case.
> 
> Hi Klaus,
> 
> thanks for working on this.
> 
> The patch LGTM, other than this:
> ...
> $ git show --pretty=%s --check
> Fix vpblendw
> 
> gdb/amd64-tdep.c:1409: space before tab in indent.
> +      	  need_modrm = 1;
> ...
> 
> I've tested both the unit test, and the updated test-case.
> 
> I've also tested the unit test I wrote, which uses vex3 instead of evex:
> ...
> +
> +  /* INSN: vpblendw $0x7,%xmm4,%xmm6,%xmm2, vex3 prefix.  */  insn = {
> + 0xc4, 0xe3, 0x49, 0x0e, 0xd4, 0x07 };  amd64_get_insn_details
> + (insn.data (), &details);  SELF_CHECK (details.opcode_len == 3);
> + SELF_CHECK (details.enc_prefix_offset == 0);  SELF_CHECK
> + (details.opcode_offset == 3);  SELF_CHECK (details.modrm_offset == 4);
> +
> +  /* INSN: vpblendw $0x7,0xff(%rip),%ymm6,%ymm2, vex3 prefix.  */  insn
> + = { 0xc4, 0xe3, 0x4d, 0x0e, 0x15, 0xff, 0x00, 0x00, 0x00, 0x07 };
> + amd64_get_insn_details (insn.data (), &details);  SELF_CHECK
> + (details.opcode_len == 3);  SELF_CHECK (details.enc_prefix_offset ==
> + 0);  SELF_CHECK (details.opcode_offset == 3);  SELF_CHECK
> + (details.modrm_offset == 4);
> +
> +  /* INSN: vpblendw $0x7,0xff(%ecx),%ymm6,%ymm2, vex3 prefix.  */
> + fixup_riprel (details, insn.data (), ECX_REG_NUM);  updated_insn
> +    = { 0xc4, 0xe3, 0x4d, 0x0e, 0x91, 0xff, 0x00, 0x00, 0x00, 0x07 };
> + SELF_CHECK (insn == updated_insn);
> ...
> 
> You could add this as well, but it's not required.
> 
> [ Note that you could drop the test-case update and use fixup_riprel on the unit
> test you added instead, like I did here and is done elsewhere in the unit test. ]
> 
> Please commit this with some appropriate $subject.
> 
> Approved-By: Tom de Vries <tdevries@suse.de>
> 
> Thanks,
> - Tom
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-27 10:32         ` Schimpe, Christina
@ 2025-08-27 12:20           ` Jan Beulich
  2025-08-27 15:23             ` Schimpe, Christina
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2025-08-27 12:20 UTC (permalink / raw)
  To: Schimpe, Christina
  Cc: Jiang, Haochen, Binutils, gdb-patches, H.J.Lu, Alexander Monakov,
	Tom de Vries, Gerlicher, Klaus, Sam James

On 27.08.2025 12:32, Schimpe, Christina wrote:
> Thanks a lot for working on this. I have followed this thread and also discussed this with Klaus a bit.
> I still wonder, would it make sense to align the logic in x86 disassembler and gdb ?

Well, fundamentally a function like amd64_get_insn_details() should live
in the disassembler (library). Just that, from all I know, the disassembler
isn't set up at all to hand out that kind of information. It would take
quite a bit of re-org to allow that to effectively be a byproduct (as it
ought to be, imo).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb?
  2025-08-27 12:20           ` Jan Beulich
@ 2025-08-27 15:23             ` Schimpe, Christina
  0 siblings, 0 replies; 9+ messages in thread
From: Schimpe, Christina @ 2025-08-27 15:23 UTC (permalink / raw)
  To: Beulich, Jan
  Cc: Jiang, Haochen, Binutils, gdb-patches, H.J.Lu, Alexander Monakov,
	Tom de Vries, Gerlicher, Klaus, Sam James

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, August 27, 2025 2:20 PM
> To: Schimpe, Christina <christina.schimpe@intel.com>
> Cc: Jiang, Haochen <haochen.jiang@intel.com>; Binutils
> <binutils@sourceware.org>; gdb-patches@sourceware.org; H.J.Lu
> <hjl.tools@gmail.com>; Alexander Monakov <amonakov@ispras.ru>; Tom de
> Vries <tdevries@suse.de>; Gerlicher, Klaus <klaus.gerlicher@intel.com>; Sam
> James <sam@gentoo.org>
> Subject: Re: Inconsistent usage on onebyte_modrm and twobyte_modrm table
> in x86 disassembler and gdb?
> 
> On 27.08.2025 12:32, Schimpe, Christina wrote:
> > Thanks a lot for working on this. I have followed this thread and also discussed
> this with Klaus a bit.
> > I still wonder, would it make sense to align the logic in x86 disassembler and
> gdb ?
> 
> Well, fundamentally a function like amd64_get_insn_details() should live in the
> disassembler (library). Just that, from all I know, the disassembler isn't set up at
> all to hand out that kind of information. It would take quite a bit of re-org to
> allow that to effectively be a byproduct (as it ought to be, imo).

I agree, that sounds like the optimal solution and I absolutely understand your concern here.

Thanks for the clarification,

Christina
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-27 15:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <SJ5PPF77D28E3C228C975969E2B0BDB2853EC3EA@SJ5PPF77D28E3C2.namprd11.prod.outlook.com>
     [not found] ` <92aea037-9c0e-438f-8a0a-fd52dd2df7bc@suse.com>
2025-08-25  8:45   ` Inconsistent usage on onebyte_modrm and twobyte_modrm table in x86 disassembler and gdb? Sam James
2025-08-25  9:26     ` Alexander Monakov
2025-08-25 14:33       ` Tom de Vries
2025-08-26  6:02         ` Jiang, Haochen
2025-08-26  8:19     ` Gerlicher, Klaus
2025-08-26  9:31       ` Tom de Vries
2025-08-27 10:32         ` Schimpe, Christina
2025-08-27 12:20           ` Jan Beulich
2025-08-27 15:23             ` Schimpe, Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox