Re: [PATCH] Allow non-ASCII characters in Rust identifiers

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
To: Tom Tromey <tom@tromey.com>
Cc: Tom Tromey <tom@tromey.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH] Allow non-ASCII characters in Rust identifiers
Date: Sun, 03 Apr 2022 18:34:11 +0100	[thread overview]
Message-ID: <875ynq8418.fsf@redhat.com> (raw)
In-Reply-To: <87mth26rgo.fsf@tromey.com>

Tom Tromey <tom@tromey.com> writes:

> Andrew> I'm seeing this test fail.
>
> Andrew>  $ rustc --version
> Andrew>  rustc 1.59.0 (9d1b2106e 2022-02-23)
>
> I installed this version with "rustup toolchain install 1.59.0" and set
> it to be my default.
>
> Andrew> I've tested with gdb commit a723766c0e2 and 5187219460c.
>
> I tried 552f1157c6262, a recent-ish git master.
> It works fine for me.
>
> Andrew> Do these pass for you?  Any suggestions for where to start looking?
>
> I wonder if this line in the .exp isn't having the desired effect:
>
>     setenv LC_ALL C.UTF-8
>
> Is this happening interactively or in some kind of automation
> environment?  Are the correct locales installed?  Do other
> LC_ALL-setting tests fail?

This is when I run under dejagnu.  If I run the test manually, and copy
the commands from the .exp file by hand, pasting them into my GDB
session, it all appears to work fine.

I'm not sure how I'd check if the correct locales are installed (I mean,
I'm not sure what I'd be looking for), but I guess as it passes when run
manually, then I'm probably OK.

Looking for scripts that set or mention LC_ALL, I found these:

  gdb.base/utf8-identifiers.exp
  gdb.python/py-source-styling.exp
  gdb.ada/non-ascii-utf-8.exp
  gdb.ada/non-ascii-latin-3.exp
  gdb.ada/non-ascii-latin-1.exp

These all run fine, except for 3 failures in
gdb.ada/non-ascii-utf-8.exp, which look suspiciously similar:

  print VAR_ð<U+0090><U+0090><U+0081>
  No definition of "var_ð<U+0090><U+0090><U+0081>" in current context.
  (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print VAR_ð<U+0090><U+0090><U+0081>
  print var_ð<U+0090><U+0090>©
  No definition of "var_ð<U+0090><U+0090>©" in current context.
  (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print var_ð<U+0090><U+0090>©
  ... snip ...
  break FUNC_ð<U+0090><U+0090><U+0081>
  Function "FUNC_ð<U+0090><U+0090><U+0081>" not defined.
  Make breakpoint pending on future shared library load? (y or [n]) n
  (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: setting breakpoint at FUNC_ð<U+0090><U+0090><U+0081>

>
> Andrew>  print "ð<U+009D><U+0095>¯"
> Andrew>  $1 = "ð\302\235\302\225¯"
>
> One thing I'd suggest is checking by hand if either the 'print' line or
> the '$1 = ' line has the correct byte values for the UTF-8 encoded form
> of the character in question.

So, this is weird.  When I look at the .exp file, I see the bytes of the
unicode character as 0xf0 0x9f 0x95 0xaf, which looks correct:

  https://www.fileformat.info/info/unicode/char/1d56f/index.htm

But, when I look at the gdb.log file, I see the following bytes 0xc3
0xb0 0xc2 0x9d 0xc2 0x95 0xc2 0xaf.

Compared to the original, the first '0xf0' changes to '0xc3 0xb0', while
all the subequent bytes get a 0xc2 byte before them.

Does any of this give any clues to what might be happening?

Thanks,
Andrew

next prev parent reply	other threads:[~2022-04-03 17:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 23:15 Tom Tromey
2022-02-06 20:23 ` Tom Tromey
2022-04-03 16:17   ` Andrew Burgess via Gdb-patches
2022-04-03 16:51     ` Tom Tromey
2022-04-03 17:34       ` Andrew Burgess via Gdb-patches [this message]
2022-04-04  9:10         ` Andrew Burgess via Gdb-patches
2022-04-04  9:48           ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875ynq8418.fsf@redhat.com \
    --to=gdb-patches@sourceware.org \
    --cc=aburgess@redhat.com \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox