From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id Y7+tCSG2SmJpLAAAWB0awg (envelope-from ) for ; Mon, 04 Apr 2022 05:10:57 -0400 Received: by simark.ca (Postfix, from userid 112) id 1DFD01F344; Mon, 4 Apr 2022 05:10:56 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id DC14D1E150 for ; Mon, 4 Apr 2022 05:10:54 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 628D2385801B for ; Mon, 4 Apr 2022 09:10:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 628D2385801B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649063454; bh=47QaIuB1F1byfUwFPZ38fkejXNr4R/8XHMAVzkurs24=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=guEnq7Jqf3zcUxU9yId0pLNNrnwyFkzxzhW6kDlHpCt1/yu0aPFXZdaUYHayFwbse OVUU9sWBjotMZ+Z7c8OCi7kCj31uxzp9YJcfdszn1oE/2xjWg6PDd+Lyiaek8yonBd 95W7uwPgiEGc97GxQBScZqRgFDdddkETONgCxE9w= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 0331E3858C27 for ; Mon, 4 Apr 2022 09:10:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0331E3858C27 Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-85-EMKxziVgNyOP9x1RfEVzYA-1; Mon, 04 Apr 2022 05:10:21 -0400 X-MC-Unique: EMKxziVgNyOP9x1RfEVzYA-1 Received: by mail-wr1-f70.google.com with SMTP id z16-20020adff1d0000000b001ef7dc78b23so1500353wro.12 for ; Mon, 04 Apr 2022 02:10:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=j1KJi23VChH7yTi41VtN0A7UqmrNiGAW9neI59f5zqU=; b=56RHfQ7rlZSlIIcZo0Dco+//DUsliePvsOoGSdTRRugXYrkIucUeBM1SiNy2TqfaaG KqiXl8M/hh/yRXZTjpwI2sRbI2kxiAZF/H2a1vOXWJSMbI6yUppFsAex8wotdg8ZEVtQ LVcSxTPsMugkb6KDpoGWmgCNA2cs1n1Q0AqcReOL3mQceyTjcSC0CwK3lcWvmO/jhph8 LIYN4XnaQcm7K5h1EgstfJH2IR0hFOwyA6GDdDm9+bfIB3jcK3GK7tLkcwPGnUjJU2J2 oVrak0Sj9ZvuFBUs15rCAN/4aWsY8AAtF9HdwSttPiBgD5c34+uE3NqI2GvojoW5HH0n n2FA== X-Gm-Message-State: AOAM533mibyr3HeJMhtYWS2Kms1p13f5gMJ+TYCU6GqqgZZ+gU7vqGIs VoD5MIxE85Ce6WMor7t/3JT8L2MliH9w87cmk3algYhquDoFVPMDgRa/BV6jZVhQTEGFhbh7T1i GE0ofZ8mYCIgSWxdtT+w0DQ== X-Received: by 2002:a05:6000:1445:b0:204:1ca1:67b0 with SMTP id v5-20020a056000144500b002041ca167b0mr16250430wrx.507.1649063419656; Mon, 04 Apr 2022 02:10:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxgZJafCiU0wnbyZ5KgSXlkXgm/trtvXRsyVVYHMbuyEBd1S8xc9qnwYXEXzgmGUK69fzYuLg== X-Received: by 2002:a05:6000:1445:b0:204:1ca1:67b0 with SMTP id v5-20020a056000144500b002041ca167b0mr16250419wrx.507.1649063419372; Mon, 04 Apr 2022 02:10:19 -0700 (PDT) Received: from localhost (host86-169-131-113.range86-169.btcentralplus.com. [86.169.131.113]) by smtp.gmail.com with ESMTPSA id q17-20020adff951000000b00205c1b97ac4sm8760315wrr.20.2022.04.04.02.10.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 02:10:18 -0700 (PDT) To: Tom Tromey Subject: Re: [PATCH] Allow non-ASCII characters in Rust identifiers In-Reply-To: <875ynq8418.fsf@redhat.com> References: <20220126231501.1031201-1-tom@tromey.com> <87y22nwxqb.fsf@tromey.com> <87ee2e87l6.fsf@redhat.com> <87mth26rgo.fsf@tromey.com> <875ynq8418.fsf@redhat.com> Date: Mon, 04 Apr 2022 10:10:18 +0100 Message-ID: <87zgl16wp1.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Andrew Burgess via Gdb-patches Reply-To: Andrew Burgess Cc: Tom Tromey , gdb-patches@sourceware.org Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" Andrew Burgess writes: > Tom Tromey writes: > >> Andrew> I'm seeing this test fail. >> >> Andrew> $ rustc --version >> Andrew> rustc 1.59.0 (9d1b2106e 2022-02-23) >> >> I installed this version with "rustup toolchain install 1.59.0" and set >> it to be my default. >> >> Andrew> I've tested with gdb commit a723766c0e2 and 5187219460c. >> >> I tried 552f1157c6262, a recent-ish git master. >> It works fine for me. >> >> Andrew> Do these pass for you? Any suggestions for where to start looki= ng? >> >> I wonder if this line in the .exp isn't having the desired effect: >> >> setenv LC_ALL C.UTF-8 >> >> Is this happening interactively or in some kind of automation >> environment? Are the correct locales installed? Do other >> LC_ALL-setting tests fail? > > This is when I run under dejagnu. If I run the test manually, and copy > the commands from the .exp file by hand, pasting them into my GDB > session, it all appears to work fine. > > I'm not sure how I'd check if the correct locales are installed (I mean, > I'm not sure what I'd be looking for), but I guess as it passes when run > manually, then I'm probably OK. > > Looking for scripts that set or mention LC_ALL, I found these: > > gdb.base/utf8-identifiers.exp > gdb.python/py-source-styling.exp > gdb.ada/non-ascii-utf-8.exp > gdb.ada/non-ascii-latin-3.exp > gdb.ada/non-ascii-latin-1.exp > > These all run fine, except for 3 failures in > gdb.ada/non-ascii-utf-8.exp, which look suspiciously similar: > > print VAR_=C3=B0 > No definition of "var_=C3=B0" in current contex= t. > (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print VAR_=C3=B0 > print var_=C3=B0=C2=A9 > No definition of "var_=C3=B0=C2=A9" in current context. > (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print var_=C3=B0=C2=A9 > ... snip ... > break FUNC_=C3=B0 > Function "FUNC_=C3=B0" not defined. > Make breakpoint pending on future shared library load? (y or [n]) n > (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: setting breakpoint at FUNC_=C3= =B0 > >> >> Andrew> print "=C3=B0=C2=AF" >> Andrew> $1 =3D "=C3=B0\302\235\302\225=C2=AF" >> >> One thing I'd suggest is checking by hand if either the 'print' line or >> the '$1 =3D ' line has the correct byte values for the UTF-8 encoded for= m >> of the character in question. > > So, this is weird. When I look at the .exp file, I see the bytes of the > unicode character as 0xf0 0x9f 0x95 0xaf, which looks correct: > > https://www.fileformat.info/info/unicode/char/1d56f/index.htm > > But, when I look at the gdb.log file, I see the following bytes 0xc3 > 0xb0 0xc2 0x9d 0xc2 0x95 0xc2 0xaf. > > Compared to the original, the first '0xf0' changes to '0xc3 0xb0', while > all the subequent bytes get a 0xc2 byte before them. > > Does any of this give any clues to what might be happening? So I put this into a text file 'unicode.tcl': puts "print =F0=9D=95=AF" (just in case that gets mangled in transit, that's the same unicode character as is used in the gdb.rust/unicode.exp test) and then I did: $ tclsh unicode.tcl and I get the same corrupted bytes as I see from the test script (c3 b0 c2 9d c2 95 c2 af). So the problem appears to be with my build of tcl. I'm currently running tcl 8.6. I wonder if you could compare this to the behaviour of your tclsh. Thanks, Andrew