From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id RQz+LiH382cXZCoAWB0awg (envelope-from ) for ; Mon, 07 Apr 2025 12:02:41 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=gEid1E7Z; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id AE99D1E0C3; Mon, 7 Apr 2025 12:02:41 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=unavailable autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 10E3B1E05C for ; Mon, 7 Apr 2025 12:02:41 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A6071384387D for ; Mon, 7 Apr 2025 16:02:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6071384387D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1744041760; bh=3qCfZbFCsf6RaFgzVo+ErRKR9mm1JGrY4Upz6BnEAXk=; h=Date:Subject:To:Cc:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=gEid1E7ZKBgzGOVNefSGD4trbYNbb0mtzx8rbE4q3g1fGjksv/lnfMcohWTMlDlP2 7HapqSKESCrIjO67xMgwsfFbkpfSoUTy+w1+tmKqObD+RXo/Q10o01x5J79cXkYV3a Zrk2fSBf3j2wCPlfMSDFVxYhoXtlgpRieMF/7bSc= Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id 0DAF7384587B for ; Mon, 7 Apr 2025 16:01:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0DAF7384587B ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0DAF7384587B ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1744041702; cv=none; b=bVc8dnH4jUEnk2ZcTZ9gVYan/1IcjiR84yQbO8OsAW52/ec7gT2WD7dX9jECoOMW27bvKhlD9TtjqRo3sEWBGa/sZQqVARYEtneG3Nj7xFnv7sQD8eexD68z5Faet+b4ZfJNmAEqY6IwChP+8r05iSsbe6XSpfesFrmktNvT04A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1744041702; c=relaxed/simple; bh=SFCyXMljuF1PGgNDgCL3sC1cSNtCex5uupioxCaSSCY=; h=DKIM-Signature:DKIM-Signature:Message-ID:Date:MIME-Version: Subject:To:From; b=Kwsjc6Ia6/4PsxlB13kW/rNLWA5LasQQXsZ4h2Ku9C9pBi43udh4ZkdAhRAg2EjFS6QU7wuCMqmsTcr9e7KY8HFspn9rGr/XC1FNjCOlu6dX/VXQJVkwIGj5pz94IMeafD+29qqj6in/2LbI0xa+MrayCOiBi9/WxtNCF4jLgz8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0DAF7384587B Received: by simark.ca (Postfix, from userid 112) id 6F7701E0C3; Mon, 7 Apr 2025 12:01:41 -0400 (EDT) Received: from [10.0.0.11] (modemcable238.237-201-24.mc.videotron.ca [24.201.237.238]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 978F31E05C; Mon, 7 Apr 2025 12:01:38 -0400 (EDT) Message-ID: <2caec2e4-6baf-45ca-875c-8a8a6b6bfe42@simark.ca> Date: Mon, 7 Apr 2025 12:01:38 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Loading some symbols, when, and index-cache To: =?UTF-8?Q?Llu=C3=ADs_Batlle_i_Rossell?= , Guinevere Larsen Cc: gdb@sourceware.org References: <6214910b-519f-420f-8999-b05e5410ad5a@redhat.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Simon Marchi via Gdb Reply-To: Simon Marchi Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org Sender: "Gdb" On 2025-04-07 08:21, Lluís Batlle i Rossell via Gdb wrote: > On Mon, Apr 07, 2025 at 09:05:10AM -0300, Guinevere Larsen via Gdb wrote: >> $ time ./gdb gdb --batch -ex "complete list def" #Basically the same as you >> did >> ./gdb gdb --batch -ex "complete list def" 42.40s user 1.19s system 120% cpu >> 36.243 total >> $ time ./gdb gdb --batch --readnow -ex "complete list def" >> ./gdb gdb --batch --readnow -ex "complete list def" 58.32s user 1.79s >> system 100% cpu 1:00.08 total >> >> So even with the slower expansion, GDB is still faster than if we had all >> symbols being read at the start, and this isn't even taking into account the >> memory usage. > > There are two points important: > > The cache should allow having a big file on disk that it's just read into > memory with zero work and then all symbols are ready to search. But > apparently this happens only with the "minimal" set of symbols, which is > far from enough for a tab completion. > > And 2nd, at gdb ELF file loading, thread workers are launched to read the > symbols from the CUs. Again, these seem to load only the "minimal" set of > symbols. In your example of 1 minute that load of CUs runs single thread. > > I can't even tell that the code behaves correctly. One could say that the > initial load of CUs multithread should load ALL symbols, and same about > caching. Or at least that could be an option. Otherwise, the things they > are meant to run faster become quite limited, while the really-slow > usual completions go single-thread uncached, taking for you 1 minute. > > Thanks, > Lluís. Hi Lluís, I think you misunderstand the role of what's called the index-cache (meaning it's perhaps not documented properly). I will summarize the process that GDB takes from start to being able to use a symbol. In this situation no index is present in the binary file nor in the index cache. 1. You do "gdb myprogram". 2. GDB reads the ELF symbols into what it calls "minimal symbols" internally. 3. In parallel, GDB demangles the names Ada/C++/whatever symbol names 4. GDB notices there is some DWARF info, so it opens it and lists the compilation units by hopping from header to header 5. In parallel (the background workers you referred to), GDB scans each compilation unit to create an in-memory index. What this index essentially consists of is a mapping of names to which compilation unit contains that name. We only need the names of variables, functions and types that the user could possibly refer to in an expression. While traversing the debug info to create that index, GDB skips most the of debug info, making it quite fast. 6. You type "print foo" in the CLI, or another expression containing a symbol name. 7. The expression parser asks the DWARF subsystem: expand all compilation units that contain a variable or function named "foo". The index is looked up to identify the candidate compilation units. 8. Serially, the DWARF subsystem fully reads those compilation units to create some "symtab"s, which is a detailed internal representation of everything that could be found in the debug info. 9. The core is then able to look up "foo" in the symtabs, find the relevant symbol, and continue the expression evaluation. To avoid doing the work in step #5, it is possible to ask compilers to generate a name index that ressembles the name -> compilation unit index that GDB would produce. Alternatively, GDB is able to add that index to a binary that doesn't have it (see the gdb-add-index command). The index cache is a third way to access that pre-computed index, which requires no user intervention (other than toggling the index-cache on). If the index-cache enabled, and no pre-computed index is present already, GDB will save it in ~/.cache/gdb, allowing it to read it back later. It's exactly the same data that gdb-add-index would add to the binary. The index cache was added because historically, using an index was _much_ faster than having GDB generate its in-memory index (it used what's called internally "partial symbols", it no longer uses them for DWARF). So it was useful, because the second time you loaded a binary was much faster than the first time. Nowadays, with the new-ish parallelized scanner I described in step #5, the time difference between using a pre-computed index or generating it on the fly is not that big. The -readnow option that Guinevere talked about skips the in-memory index generation (or skips reading the pre-computed index if there is one) and goes directly to expanding all compilation units into symtabs right away. I would typically not recommend using this day-to-day, other than maybe if you need to work around an indexer bug. When you type "list def" and it's very slow, it's likely that a lot of compilation units have a symbol that starts with "def", so a lot of compilation units get expanded into symtabs. Only after that happens can the core of GDB search generate completion list by searching the symtabs. That's not optimal, there's certainly room for improvement here (I too get frustrated by very slow tab-completion, to the point where I often avoid it). I must point out that a pending patch series changes a bit how symbols are looked up and symtabs expanded, which will probably make what I described somewhat outdated: [PATCH v2 00/28] Search symbols via quick API https://inbox.sourceware.org/gdb-patches/20250402-search-in-psyms-v2-0-ea91704487cb@tromey.com/T/#m737dd42bf8767f5719ffac7eb147977c4b0f2829 Simon