From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id qPHnFH/M7WeujSQAWB0awg (envelope-from ) for ; Wed, 02 Apr 2025 19:47:11 -0400 Authentication-Results: simark.ca; dkim=fail reason="signature verification failed" (768-bit key; unprotected) header.d=tromey.com header.i=@tromey.com header.a=rsa-sha256 header.s=default header.b=FwWHFkCz; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id 520A01E0C3; Wed, 2 Apr 2025 19:47:11 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.1 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_INVALID,DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 8597D1E0C0 for ; Wed, 2 Apr 2025 19:47:10 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0F3FD3857013 for ; Wed, 2 Apr 2025 23:47:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F3FD3857013 Authentication-Results: sourceware.org; dkim=fail reason="signature verification failed" (768-bit key, unprotected) header.d=tromey.com header.i=@tromey.com header.a=rsa-sha256 header.s=default header.b=FwWHFkCz Received: from omta034.useast.a.cloudfilter.net (omta034.useast.a.cloudfilter.net [44.202.169.33]) by sourceware.org (Postfix) with ESMTPS id D2DDB3858431 for ; Wed, 2 Apr 2025 23:45:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D2DDB3858431 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D2DDB3858431 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=44.202.169.33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1743637512; cv=none; b=p63fxvqJjHzxNdgoq+X5imxK4TwR/i6n7hJ3nReiUl0H1nxo769SQtFw4hwqVh2qhgmyy0XwQcBRUz4fz3bUHV2EpijwB6TRZMGAS5b7NWtpXVH1Mk0YKAld9ChoYBMIaijDFB7LHxNA/zVynDJoToN+N/oLsgj/N9ea69C2ddE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1743637512; c=relaxed/simple; bh=FajM5cPeDsQFy0MS71l6iX02s1w4Y7G9LzC8VXKYrkM=; h=DKIM-Signature:From:Date:Subject:MIME-Version:Message-Id:To; b=r1h+yntK75htZYFzYC4AxwMvv+TKMHiuXmhAClA/1tqzzxbXGoDJzg+8LjhwueFhAh7BO0XfFXrFtkrvEDRzIVWMCErEayBjzJ2ts+O1S6cCNXjBpp1NgjVlRnNEbLs3hjBlk0F0JHpZtXHNbNTu6M7jQGFpscUUdBP3SXCKOJo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D2DDB3858431 Received: from eig-obgw-5001a.ext.cloudfilter.net ([10.0.29.139]) by cmsmtp with ESMTPS id 02sfuPI2DXshw07lguf4BK; Wed, 02 Apr 2025 23:45:12 +0000 Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTPS id 07lfuz3MURxIG07lguzsv1; Wed, 02 Apr 2025 23:45:12 +0000 X-Authority-Analysis: v=2.4 cv=N/viFH9B c=1 sm=1 tr=0 ts=67edcc08 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=IkcTkHD0fZMA:10 a=XR8D0OoHHMoA:10 a=ItBw4LHWJt0A:10 a=D5Z4WpcXYkCQU8PNWbQA:9 a=QEXdDO2ut3YA:10 a=6Ogn3jAGHLSNbaov7Orx:22 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=qEqBcT4Wg5gGXDvq+44T5r2f+WrYNZPbMtkfEzoEdKk=; b=FwWHFkCz0fQtx9lu6PxQ1EFAr5 Sv7f/BamM7SrpbZCdrczYDXWFNk3fuewNkyg13Viu6V/2AN2Qu9RmfTs0Z7F+VKoMTrRhKnGh5O/D xLaxf9yl8wxjzSYWuI1Fc2145; Received: from 97-122-123-18.hlrn.qwest.net ([97.122.123.18]:56394 helo=prentzel.local) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.1) (envelope-from ) id 1u07lf-000000014hd-17PA; Wed, 02 Apr 2025 17:45:11 -0600 From: Tom Tromey Date: Wed, 02 Apr 2025 17:45:01 -0600 Subject: [PATCH v2 02/28] Change ada_decode to preserve upper-case in some situations MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20250402-search-in-psyms-v2-2-ea91704487cb@tromey.com> References: <20250402-search-in-psyms-v2-0-ea91704487cb@tromey.com> In-Reply-To: <20250402-search-in-psyms-v2-0-ea91704487cb@tromey.com> To: gdb-patches@sourceware.org Cc: Tom Tromey X-Mailer: b4 0.14.2 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 97.122.123.18 X-Source-L: No X-Exim-ID: 1u07lf-000000014hd-17PA X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 97-122-123-18.hlrn.qwest.net (prentzel.local) [97.122.123.18]:56394 X-Source-Auth: tom+tromey.com X-Email-Count: 3 X-Org: HG=bhshared;ORG=bluehost; X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-CMAE-Envelope: MS4xfKF77WfTifsS+BQx5USESTKE4jZ/fvUtWQBWWIiqrYx50/xwM3THAv8P/gKeFqsH3Ce4ylTVD5KOYXoSYUdZg4HmXqI8ytasZ3vZ0wFJ0fnrl9tPQ1oR mIIMf+2P9WVsjFQPegv6BzDjmXdd+WVjXQUgmKLPnzHh2c4hgA4azGg9BeA9eRjrMApeE1OpGP7sSap2iiy0eCETkfD+Jqxx6oI= X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~public-inbox=simark.ca@sourceware.org This patch is needed to avoid regressions later in the series. The issue here is that ada_decode, when called with wide=false, would act as though the input needed verbatim quoting. That would happen because the 'W' character would be passed through; and then a later loop would reject the result due to that character. Similarly, with operators=false the upper-case-checking loop would be skipped, but then some names that did need verbatim quoting would pass through. Furthermore I noticed that there isn't a need to distinguish between the "wide" and "operators" cases -- all callers pass identical values to both. This patch cleans up the above, consolidating the parameters and changing how upper-case detection is handled, so that both the operator and wide cases pass-through without issue. I've added new unit tests for this. --- gdb/ada-lang.c | 83 +++++++++++++++++++++++++++++------------ gdb/ada-lang.h | 15 +++----- gdb/dwarf2/cooked-index-shard.c | 2 +- gdb/symtab.h | 2 +- 4 files changed, 68 insertions(+), 34 deletions(-) diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c index a55ee12ce70d02082e64d85634b87dd27f5a0670..4bb6a808fd8c1a7f8e4b2344fdf935f94c602ed1 100644 --- a/gdb/ada-lang.c +++ b/gdb/ada-lang.c @@ -1308,7 +1308,7 @@ convert_from_hex_encoded (std::string &out, const char *str, int n) /* See ada-lang.h. */ std::string -ada_decode (const char *encoded, bool wrap, bool operators, bool wide) +ada_decode (const char *encoded, bool wrap, bool translate) { int i; int len0; @@ -1403,7 +1403,7 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) while (i < len0) { /* Is this a symbol function? */ - if (operators && at_start_name && encoded[i] == 'O') + if (at_start_name && encoded[i] == 'O') { int k; @@ -1414,7 +1414,10 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) op_len - 1) == 0) && !isalnum (encoded[i + op_len])) { - decoded.append (ada_opname_table[k].decoded); + if (translate) + decoded.append (ada_opname_table[k].decoded); + else + decoded.append (ada_opname_table[k].encoded); at_start_name = 0; i += op_len; break; @@ -1502,28 +1505,59 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) i++; } - if (wide && i < len0 + 3 && encoded[i] == 'U' && isxdigit (encoded[i + 1])) + /* Handle wide characters while respecting the arguments to the + function: we may want to copy them verbatim, but in this case + we do not want to register that we've copied an upper-case + character. */ + if (i < len0 + 3 && encoded[i] == 'U' && isxdigit (encoded[i + 1])) { - if (convert_from_hex_encoded (decoded, &encoded[i + 1], 2)) + if (translate) { - i += 3; + if (convert_from_hex_encoded (decoded, &encoded[i + 1], 2)) + { + i += 3; + continue; + } + } + else + { + decoded.push_back (encoded[i]); + ++i; continue; } } - else if (wide && i < len0 + 5 && encoded[i] == 'W' && isxdigit (encoded[i + 1])) + else if (i < len0 + 5 && encoded[i] == 'W' && isxdigit (encoded[i + 1])) { - if (convert_from_hex_encoded (decoded, &encoded[i + 1], 4)) + if (translate) + { + if (convert_from_hex_encoded (decoded, &encoded[i + 1], 4)) + { + i += 5; + continue; + } + } + else { - i += 5; + decoded.push_back (encoded[i]); + ++i; continue; } } - else if (wide && i < len0 + 10 && encoded[i] == 'W' && encoded[i + 1] == 'W' + else if (i < len0 + 10 && encoded[i] == 'W' && encoded[i + 1] == 'W' && isxdigit (encoded[i + 2])) { - if (convert_from_hex_encoded (decoded, &encoded[i + 2], 8)) + if (translate) { - i += 10; + if (convert_from_hex_encoded (decoded, &encoded[i + 2], 8)) + { + i += 10; + continue; + } + } + else + { + decoded.push_back (encoded[i]); + ++i; continue; } } @@ -1550,6 +1584,12 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) at_start_name = 1; i += 2; } + else if (isupper (encoded[i]) || encoded[i] == ' ') + { + /* Decoded names should never contain any uppercase + character. */ + goto Suppress; + } else { /* It's a character part of the decoded name, so just copy it @@ -1559,16 +1599,6 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) } } - /* Decoded names should never contain any uppercase character. - Double-check this, and abort the decoding if we find one. */ - - if (operators) - { - for (i = 0; i < decoded.length(); ++i) - if (isupper (decoded[i]) || decoded[i] == ' ') - goto Suppress; - } - /* If the compiler added a suffix, append it now. */ if (suffix >= 0) decoded = decoded + "[" + &encoded[suffix] + "]"; @@ -1594,6 +1624,13 @@ ada_decode_tests () /* This isn't valid, but used to cause a crash. PR gdb/30639. The result does not really matter very much. */ SELF_CHECK (ada_decode ("44") == "44"); + + /* Check that the settings used by the DWARF reader have the desired + effect. */ + SELF_CHECK (ada_decode ("symada__cS", false, false) == ""); + SELF_CHECK (ada_decode ("pkg__Oxor", false, false) == "pkg.Oxor"); + SELF_CHECK (ada_decode ("pack__func_W017b", false, false) + == "pack.func_W017b"); } #endif @@ -13311,7 +13348,7 @@ ada_lookup_name_info::ada_lookup_name_info (const lookup_name_info &lookup_name) else m_standard_p = false; - m_decoded_name = ada_decode (m_encoded_name.c_str (), true, false, false); + m_decoded_name = ada_decode (m_encoded_name.c_str (), true, false); /* If the name contains a ".", then the user is entering a fully qualified entity name, and the match must not be done in wild diff --git a/gdb/ada-lang.h b/gdb/ada-lang.h index 3582082a1a1b702595b803072ff9c345b7f3e0f7..a96a1f6e01737b03c6e6dea5024fbdd253647201 100644 --- a/gdb/ada-lang.h +++ b/gdb/ada-lang.h @@ -218,16 +218,13 @@ extern const char *ada_decode_symbol (const struct general_symbol_info *); simply wrapped in <...>. If WRAP is false, then the empty string will be returned. - When OPERATORS is false, operator names will not be decoded. By - default, they are decoded, e.g., 'Oadd' will be transformed to - '"+"'. - - When WIDE is false, wide characters will be left as-is. By - default, they converted from their hex encoding to the host - charset. */ + TRANSLATE has two effects. When true (the default), operator names + and wide characters will be decoded. E.g., 'Oadd' will be + transformed to '"+"', and wide characters converted from their hex + encoding to the host charset. When false, these will be left + alone. */ extern std::string ada_decode (const char *name, bool wrap = true, - bool operators = true, - bool wide = true); + bool translate = true); extern std::vector ada_lookup_symbol_list (const char *, const struct block *, domain_search_flags); diff --git a/gdb/dwarf2/cooked-index-shard.c b/gdb/dwarf2/cooked-index-shard.c index 683feb2ce9615be23a39f3934e922b53574fa5ab..29a8aea513786e4c1c1ed77dee8610fc329d1c8a 100644 --- a/gdb/dwarf2/cooked-index-shard.c +++ b/gdb/dwarf2/cooked-index-shard.c @@ -108,7 +108,7 @@ cooked_index_shard::handle_gnat_encoded_entry characters are left as-is. This is done to make name matching a bit simpler; and for wide characters, it means the choice of Ada source charset does not affect the indexer directly. */ - std::string canonical = ada_decode (entry->name, false, false, false); + std::string canonical = ada_decode (entry->name, false, false); if (canonical.empty ()) { entry->canonical = entry->name; diff --git a/gdb/symtab.h b/gdb/symtab.h index 7927380fca3f115fd43ecdaf683ecc07a0ff22e0..83913b1806f4a5fe39987978bb7059efc606a594 100644 --- a/gdb/symtab.h +++ b/gdb/symtab.h @@ -145,7 +145,7 @@ class ada_lookup_name_info final std::string m_encoded_name; /* The decoded lookup name. This is formed by calling ada_decode - with both 'operators' and 'wide' set to false. */ + with 'translate' set to false. */ std::string m_decoded_name; /* Whether the user-provided lookup name was Ada encoded. If so, -- 2.46.1