From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id wDUkCa0cXWUxCwwAWB0awg
	(envelope-from <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>)
	for <public-inbox@simark.ca>; Tue, 21 Nov 2023 16:10:05 -0500
Authentication-Results: simark.ca;
	dkim=pass (2048-bit key; secure) header.d=adacore.com header.i=@adacore.com header.a=rsa-sha256 header.s=google header.b=dYh+AUxa;
	dkim-atps=neutral
Received: by simark.ca (Postfix, from userid 112)
	id 1F96D1E11B; Tue, 21 Nov 2023 16:10:05 -0500 (EST)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id E9BD21E0D2
	for <public-inbox@simark.ca>; Tue, 21 Nov 2023 16:10:02 -0500 (EST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2F96D3858C2D
	for <public-inbox@simark.ca>; Tue, 21 Nov 2023 21:10:02 +0000 (GMT)
Received: from mail-io1-xd2f.google.com (mail-io1-xd2f.google.com
 [IPv6:2607:f8b0:4864:20::d2f])
 by sourceware.org (Postfix) with ESMTPS id 033023858D39
 for <gdb-patches@sourceware.org>; Tue, 21 Nov 2023 21:09:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 033023858D39
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=adacore.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 033023858D39
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2607:f8b0:4864:20::d2f
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700600971; cv=none;
 b=TvhdcK71K6CSz7WmgtHD+vpKonCE7msmugHMKcXVXCi2q3Gp4/rFXpTaAehZdDHlhjamS6ti7ZvgO8IC6yA6UxoNSw2u1xqZwNseJRi1y7LcTAZHNS/14LA3rchK6vtQyPdyHGUqnedq2lXh8kNiPrTirWSgRK7ZEzznzdFSRtM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1700600971; c=relaxed/simple;
 bh=G+BJXHgeyG7hLb/eSC10TeaqeK/m+B6iY/uawWRzUL8=;
 h=DKIM-Signature:From:Date:Subject:MIME-Version:Message-Id:To;
 b=c2IW2I8aZoot53XFIMQeE2xYca2UV2ogcYZqEUGsTrfqZRM32qMh8Y4sWe/DsrgCNjCIWphwrXJ6D9Gg9PE5Qabqvrfy2jCoeao+ddD5+Lz3d4r06livPgOaffLCTtTIVg8Z/pPEoi78vp88if9gmpJ8zL3Bfdvvg8xdwuJ/Svk=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-io1-xd2f.google.com with SMTP id
 ca18e2360f4ac-7a6774da682so266675039f.3
 for <gdb-patches@sourceware.org>; Tue, 21 Nov 2023 13:09:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=adacore.com; s=google; t=1700600967; x=1701205767; darn=sourceware.org;
 h=to:in-reply-to:references:message-id:content-transfer-encoding
 :mime-version:subject:date:from:from:to:cc:subject:date:message-id
 :reply-to; bh=YIIYfitP+FKd4RJmFqzPOps0OOST2ciPgr74lNw2xgc=;
 b=dYh+AUxakQS1eKxRazP1XqU1ID9vk1/Ro2Jsyj3cA1fToeYlb0yPzLZBINlJ6oY9cd
 tAdREFIhzlxQkFHZyT1+eGcJRqS6oiIlqVfqbbXNZweQa5m0b9ojW6/CzKw0RwMvuUGb
 OQADyest3Uf+lxoeBihNLO50iJvd+j3uxtLugQXNIVMovosmrDG+1J/iDwcADQlmYNTV
 TSz+t8a2sVya8pXPXRWWX1yaFaJ3AcfUDswn6KgUYEebllGNJIXpd3a/l0y+ebIAOgUF
 I9WLUffLVbg9sbaF6CMjzutHs+PGtDL+I9vXlnr1KHCbviBKhTruby52w7qxsxDSctqC
 fppw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1700600967; x=1701205767;
 h=to:in-reply-to:references:message-id:content-transfer-encoding
 :mime-version:subject:date:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=YIIYfitP+FKd4RJmFqzPOps0OOST2ciPgr74lNw2xgc=;
 b=eUxm54xL7+S8ssCzie2/7/7VAcKKQ29rX9YlkhbBUAQx9UCb9NOIOgRLsK5I2nsMcH
 qCUj5Ft2OKPvv+r2IfbATwYFNVFZYW4iuLSjLFCmzIiQZ20OhVhUMGwoNOiICHeULbx7
 0n3fcHFPTebPLnwk+ZD+QUmcnDmkjQE93InO9X2C5MhvBoobZribVlIjLCTHKrLVz1l8
 6oi1rdNSCeOyy2xGvsy0hgEGdKLh4LVPYFCYhSXwBmTxP3A1V0pELUzLMIWfY5f5/j6N
 OxsysuabJfpqwlKV3NmJs0YP9F2yJhZwmZbnEzHDQDuzDLBOAh597kd8mLrPlIHTpwAf
 GxTw==
X-Gm-Message-State: AOJu0YwfP4oiFUg//k8KPwy1Mj4Aasx+vsVApizO9UX9fz3UAfnR434P
 X6MsMsTsgAJY1AV4wMwjDokNwQJpoDF0S+moGZcrKg==
X-Google-Smtp-Source: AGHT+IHscYKJnV77+y/G71FRD4Kpr6uGhJI2M06rWlbycZrxoz5Vd2o8N7ZgGykdAzN7ALE2+r0wqw==
X-Received: by 2002:a05:6602:35e:b0:7a9:b1ad:dddd with SMTP id
 w30-20020a056602035e00b007a9b1adddddmr119535iou.12.1700600967366; 
 Tue, 21 Nov 2023 13:09:27 -0800 (PST)
Received: from localhost.localdomain (97-122-77-73.hlrn.qwest.net.
 [97.122.77.73]) by smtp.gmail.com with ESMTPSA id
 l14-20020a6b750e000000b0079f7734a77esm3050242ioh.35.2023.11.21.13.09.26
 for <gdb-patches@sourceware.org>
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 21 Nov 2023 13:09:27 -0800 (PST)
From: Tom Tromey <tromey@adacore.com>
Date: Tue, 21 Nov 2023 14:09:26 -0700
Subject: [PATCH 2/4] Always use expand_symtabs_matching in ada-lang.c
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <20231121-ada-lookup-perf-v1-2-1efd2d1dbf65@adacore.com>
References: <20231121-ada-lookup-perf-v1-0-1efd2d1dbf65@adacore.com>
In-Reply-To: <20231121-ada-lookup-perf-v1-0-1efd2d1dbf65@adacore.com>
To: gdb-patches@sourceware.org
X-Mailer: b4 0.12.4
X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org

The previous patch fixed the immediate performance problem with Ada
name matching, by having a subset of matches call
expand_symtabs_matching rather than expand_matching_symbols.  However,
it seemed to me that expand_matching_symbols should not be needed at
all.

To achieve this, this patch changes ada_lookup_name_info::split_name
to use the decoded name, rather than the encoded name.  In order to
make this work correctly, a new decoded form is used: one that does
not decode operators (this is already done) and also does not decode
wide characters.  The latter change is done so that changes to the Ada
source charset don't affect the DWARF index.

With this in place, we can change ada-lang.c to always use
expand_symtabs_matching rather than expand_matching_symbols.
---
 gdb/ada-lang.c            | 119 ++++++----------------------------------------
 gdb/ada-lang.h            |  14 ++++--
 gdb/dwarf2/cooked-index.c |   6 ++-
 gdb/symtab.h              |  13 +++--
 4 files changed, 40 insertions(+), 112 deletions(-)

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index 8c5ab93f3ca..edd68cd2c32 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -1308,7 +1308,7 @@ convert_from_hex_encoded (std::string &out, const char *str, int n)
 /* See ada-lang.h.  */
 
 std::string
-ada_decode (const char *encoded, bool wrap, bool operators)
+ada_decode (const char *encoded, bool wrap, bool operators, bool wide)
 {
   int i;
   int len0;
@@ -1502,7 +1502,7 @@ ada_decode (const char *encoded, bool wrap, bool operators)
 	    i++;
 	}
 
-      if (i < len0 + 3 && encoded[i] == 'U' && isxdigit (encoded[i + 1]))
+      if (wide && i < len0 + 3 && encoded[i] == 'U' && isxdigit (encoded[i + 1]))
 	{
 	  if (convert_from_hex_encoded (decoded, &encoded[i + 1], 2))
 	    {
@@ -1510,7 +1510,7 @@ ada_decode (const char *encoded, bool wrap, bool operators)
 	      continue;
 	    }
 	}
-      else if (i < len0 + 5 && encoded[i] == 'W' && isxdigit (encoded[i + 1]))
+      else if (wide && i < len0 + 5 && encoded[i] == 'W' && isxdigit (encoded[i + 1]))
 	{
 	  if (convert_from_hex_encoded (decoded, &encoded[i + 1], 4))
 	    {
@@ -1518,7 +1518,7 @@ ada_decode (const char *encoded, bool wrap, bool operators)
 	      continue;
 	    }
 	}
-      else if (i < len0 + 10 && encoded[i] == 'W' && encoded[i + 1] == 'W'
+      else if (wide && i < len0 + 10 && encoded[i] == 'W' && encoded[i + 1] == 'W'
 	       && isxdigit (encoded[i + 2]))
 	{
 	  if (convert_from_hex_encoded (decoded, &encoded[i + 2], 8))
@@ -5465,91 +5465,6 @@ ada_add_block_renamings (std::vector<struct block_symbol> &result,
   return result.size () != defns_mark;
 }
 
-/* Implements compare_names, but only applying the comparision using
-   the given CASING.  */
-
-static int
-compare_names_with_case (const char *string1, const char *string2,
-			 enum case_sensitivity casing)
-{
-  while (*string1 != '\0' && *string2 != '\0')
-    {
-      char c1, c2;
-
-      if (isspace (*string1) || isspace (*string2))
-	return strcmp_iw_ordered (string1, string2);
-
-      if (casing == case_sensitive_off)
-	{
-	  c1 = tolower (*string1);
-	  c2 = tolower (*string2);
-	}
-      else
-	{
-	  c1 = *string1;
-	  c2 = *string2;
-	}
-      if (c1 != c2)
-	break;
-
-      string1 += 1;
-      string2 += 1;
-    }
-
-  switch (*string1)
-    {
-    case '(':
-      return strcmp_iw_ordered (string1, string2);
-    case '_':
-      if (*string2 == '\0')
-	{
-	  if (is_name_suffix (string1))
-	    return 0;
-	  else
-	    return 1;
-	}
-      /* FALLTHROUGH */
-    default:
-      if (*string2 == '(')
-	return strcmp_iw_ordered (string1, string2);
-      else
-	{
-	  if (casing == case_sensitive_off)
-	    return tolower (*string1) - tolower (*string2);
-	  else
-	    return *string1 - *string2;
-	}
-    }
-}
-
-/* Compare STRING1 to STRING2, with results as for strcmp.
-   Compatible with strcmp_iw_ordered in that...
-
-       strcmp_iw_ordered (STRING1, STRING2) <= 0
-
-   ... implies...
-
-       compare_names (STRING1, STRING2) <= 0
-
-   (they may differ as to what symbols compare equal).  */
-
-static int
-compare_names (const char *string1, const char *string2)
-{
-  int result;
-
-  /* Similar to what strcmp_iw_ordered does, we need to perform
-     a case-insensitive comparison first, and only resort to
-     a second, case-sensitive, comparison if the first one was
-     not sufficient to differentiate the two strings.  */
-
-  result = compare_names_with_case (string1, string2, case_sensitive_off);
-  if (result == 0)
-    result = compare_names_with_case (string1, string2, case_sensitive_on);
-
-  return result;
-}
-
 /* Convenience function to get at the Ada encoded lookup name for
    LOOKUP_NAME, as a C string.  */
 
@@ -5559,29 +5474,24 @@ ada_lookup_name (const lookup_name_info &lookup_name)
   return lookup_name.ada ().lookup_name ().c_str ();
 }
 
-/* A helper for add_nonlocal_symbols.  Call expand_matching_symbols
+/* A helper for add_nonlocal_symbols.  Expand all necessary symtabs
    for OBJFILE, then walk the objfile's symtabs and update the
    results.  */
 
 static void
 map_matching_symbols (struct objfile *objfile,
 		      const lookup_name_info &lookup_name,
-		      bool is_wild_match,
 		      domain_enum domain,
 		      int global,
 		      match_data &data)
 {
   data.objfile = objfile;
-  if (is_wild_match || lookup_name.ada ().standard_p ())
-    objfile->expand_matching_symbols (lookup_name, domain, global,
-				      is_wild_match ? nullptr : compare_names);
-  else
-    objfile->expand_symtabs_matching (nullptr, &lookup_name,
-				      nullptr, nullptr,
-				      global
-				      ? SEARCH_GLOBAL_BLOCK
-				      : SEARCH_STATIC_BLOCK,
-				      domain, ALL_DOMAIN);
+  objfile->expand_symtabs_matching (nullptr, &lookup_name,
+				    nullptr, nullptr,
+				    global
+				    ? SEARCH_GLOBAL_BLOCK
+				    : SEARCH_STATIC_BLOCK,
+				    domain, ALL_DOMAIN);
 
   const int block_kind = global ? GLOBAL_BLOCK : STATIC_BLOCK;
   for (compunit_symtab *symtab : objfile->compunits ())
@@ -5610,8 +5520,7 @@ add_nonlocal_symbols (std::vector<struct block_symbol> &result,
 
   for (objfile *objfile : current_program_space->objfiles ())
     {
-      map_matching_symbols (objfile, lookup_name, is_wild_match, domain,
-			    global, data);
+      map_matching_symbols (objfile, lookup_name, domain, global, data);
 
       for (compunit_symtab *cu : objfile->compunits ())
 	{
@@ -5631,7 +5540,7 @@ add_nonlocal_symbols (std::vector<struct block_symbol> &result,
       lookup_name_info name1 (bracket_name, symbol_name_match_type::FULL);
 
       for (objfile *objfile : current_program_space->objfiles ())
-	map_matching_symbols (objfile, name1, false, domain, global, data);
+	map_matching_symbols (objfile, name1, domain, global, data);
     }
 }
 
@@ -13297,6 +13206,8 @@ ada_lookup_name_info::ada_lookup_name_info (const lookup_name_info &lookup_name)
       else
 	m_standard_p = false;
 
+      m_decoded_name = ada_decode (m_encoded_name.c_str (), true, false, false);
+
       /* If the name contains a ".", then the user is entering a fully
 	 qualified entity name, and the match must not be done in wild
 	 mode.  Similarly, if the user wants to complete what looks
diff --git a/gdb/ada-lang.h b/gdb/ada-lang.h
index 9eb9326a86c..14a0be4c037 100644
--- a/gdb/ada-lang.h
+++ b/gdb/ada-lang.h
@@ -216,10 +216,18 @@ extern const char *ada_decode_symbol (const struct general_symbol_info *);
    the name does not appear to be GNAT-encoded, then the result
    depends on WRAP.  If WRAP is true (the default), then the result is
    simply wrapped in <...>.  If WRAP is false, then the empty string
-   will be returned.  Also, when OPERATORS is false, operator names
-   will not be decoded.  */
+   will be returned.
+
+   When OPERATORS is false, operator names will not be decoded.  By
+   default, they are decoded, e.g., 'Oadd' will be transformed to
+   '"+"'.
+
+   When WIDE is false, wide characters will be left as-is.  By
+   default, they converted from their hex encoding to the host
+   charset.  */
 extern std::string ada_decode (const char *name, bool wrap = true,
-			       bool operators = true);
+			       bool operators = true,
+			       bool wide = true);
 
 extern std::vector<struct block_symbol> ada_lookup_symbol_list
      (const char *, const struct block *, domain_enum);
diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
index 10631dccecf..ba77f9cb373 100644
--- a/gdb/dwarf2/cooked-index.c
+++ b/gdb/dwarf2/cooked-index.c
@@ -263,7 +263,11 @@ gdb::unique_xmalloc_ptr<char>
 cooked_index_shard::handle_gnat_encoded_entry (cooked_index_entry *entry,
 					       htab_t gnat_entries)
 {
-  std::string canonical = ada_decode (entry->name, false, false);
+  /* We decode Ada names in a particular way: operators and wide
+     characters are left as-is.  This is done to make name matching a
+     bit simpler; and for wide characters, it means the choice of Ada
+     source charset does not affect the indexer directly.  */
+  std::string canonical = ada_decode (entry->name, false, false, false);
   if (canonical.empty ())
     return {};
   std::vector<std::string_view> names = split_name (canonical.c_str (),
diff --git a/gdb/symtab.h b/gdb/symtab.h
index ec2ac4942d3..ff201e3ff26 100644
--- a/gdb/symtab.h
+++ b/gdb/symtab.h
@@ -128,21 +128,26 @@ class ada_lookup_name_info final
      peculiarities.  */
   std::vector<std::string_view> split_name () const
   {
-    if (m_verbatim_p || m_standard_p)
+    if (m_verbatim_p)
       {
+	/* For verbatim matches, just return the encoded name
+	   as-is.  */
 	std::vector<std::string_view> result;
-	if (m_standard_p)
-	  result.emplace_back ("standard");
 	result.emplace_back (m_encoded_name);
 	return result;
       }
-    return ::split_name (m_encoded_name.c_str (), split_style::UNDERSCORE);
+    /* Otherwise, split the decoded name for matching.  */
+    return ::split_name (m_decoded_name.c_str (), split_style::DOT_STYLE);
   }
 
 private:
   /* The Ada-encoded lookup name.  */
   std::string m_encoded_name;
 
+  /* The decoded lookup name.  This is formed by calling ada_decode
+     with both 'operators' and 'wide' set to false.  */
+  std::string m_decoded_name;
+
   /* Whether the user-provided lookup name was Ada encoded.  If so,
      then return encoded names in the 'matches' method's 'completion
      match result' output.  */

-- 
2.41.0