From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id 36SBA0w+XWhtKiAAWB0awg (envelope-from ) for ; Thu, 26 Jun 2025 08:34:20 -0400 Authentication-Results: simark.ca; dkim=pass (2048-bit key; unprotected) header.d=gnu.org header.i=@gnu.org header.a=rsa-sha256 header.s=fencepost-gnu-org header.b=kSNuZqEW; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id F16AB1E11E; Thu, 26 Jun 2025 08:34:19 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-10.1 required=5.0 tests=ARC_SIGNED,ARC_VALID, BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_VALIDITY_CERTIFIED, RCVD_IN_VALIDITY_RPBL,RCVD_IN_VALIDITY_SAFE autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 226ED1E089 for ; Thu, 26 Jun 2025 08:34:19 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B91B385B534 for ; Thu, 26 Jun 2025 12:34:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B91B385B534 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gnu.org header.i=@gnu.org header.a=rsa-sha256 header.s=fencepost-gnu-org header.b=kSNuZqEW Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id CA83038560AB for ; Thu, 26 Jun 2025 12:32:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CA83038560AB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gnu.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gnu.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CA83038560AB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750941177; cv=none; b=giet0aH/6BAyr/AEQo908kUIVjMEIY0OWpVnIEgcC57nOPjmL2y4McljrAR0RP+TtiTwuFl/IungNhbugBfgV99Oh9LlgR78jXe1L/NW1vrWyMypY3/z0IH4EFGy2ZQ4jZRWNXkPt28GNcrYDN9OT3Dq0DBsxybEuXgsPbJtthA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1750941177; c=relaxed/simple; bh=Getg/PbXKqFhnYDN5idbgtfw0D9+LFRINwCM21fwJWk=; h=DKIM-Signature:Date:Message-Id:From:To:Subject; b=fRi7U+dbDUxJdfRVKb55qbVz8Tou89XefdzPCMWKi3iKlYBKvDbP4kHfsVMBNheIBFUrrCMNBYZakblbXgYbTWv9w8Abbmvi1NwPFwHUZiiKG4NlwQunoqdrP0wM77fDGjPeeD8WFSr2VqZmowLQkdGFjUfYZkdpWc1iXOQDMtg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CA83038560AB Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uUlmf-0007ao-Fg; Thu, 26 Jun 2025 08:32:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=2En3TUNG+VejNo2nIASUHOYIU+mz8Hnp5f952acRkBw=; b=kSNuZqEWhv+t x/hzql2U6pJCFh9B0cx9dQmn2YbcBlUCzuo80mMAYTR/o7+dkXsQET2jk8adOuxofBsKunq4jaTRz XuaK3CKoqaUKUHUox5qdMjEFSQpNhfhylbd2MbKOIg9hxFxv7TCR3EpDtmlQKKOdGbSwKjytJfL6d XkrwYgRIoYBw/waQi0rWKE/6OQDwVzKii0KrEKG3Q3xAKX9/aZ78rJ09ZpdwSwEEGfa9yb4QbYa+q 3x/5S6uZYo2w4pQ+kf1PZce8ffl51q91ueiDAOnl65FV3bZ8zxNBn73Khjy7OsDVEyYtjp1NfGl+c y6T/5gIymIg/ayWWk6xsIQ==; Date: Thu, 26 Jun 2025 15:32:50 +0300 Message-Id: <8634bmd6od.fsf@gnu.org> From: Eli Zaretskii To: Pedro Alves Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org In-Reply-To: <2ea323f9-d606-4bc9-b4db-995f980d85c7@palves.net> (message from Pedro Alves on Thu, 26 Jun 2025 11:35:37 +0100) Subject: Re: [PATCH 2/2] Allow check-mark to be changed for CLI References: <20250509-emoji-check-mark-v1-0-63b6c52411f3@adacore.com> <20250509-emoji-check-mark-v1-2-63b6c52411f3@adacore.com> <87zfffnqrj.fsf@redhat.com> <87wmag63jg.fsf@tromey.com> <86ikm0y1hi.fsf@gnu.org> <86o6ubcapd.fsf@gnu.org> <2ea323f9-d606-4bc9-b4db-995f980d85c7@palves.net> X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~public-inbox=simark.ca@sourceware.org > Date: Thu, 26 Jun 2025 11:35:37 +0100 > Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org > From: Pedro Alves > > As the blog explains, > > "historically terminals used wcwidth, shells, editors, and other TUI apps also use wcwidth, and continue > to do so today. Even though it produces the wrong values for multi-codepoint graphemes, at least the > wrong values are often consistently wrong across terminal emulators." > > So what can go wrong is grapheme clustering. Both the terminal and the app must agree on > clustering for that to work properly. We could implement that protocol in GDB today, as there's > a way to query if the terminal supports it. But I'm not proposing that. > > So if we ignore multi-codepoint graphemes, then wcwidth always matches what the > terminals do. Using wcwidth should probably work in many cases, but we still assume that the terminal uses a font whose glyphs' width agrees with wcwidth. That is not a given. > > For example, the following string of > > characters: > > > > U+1F468 U+200D U+1F469 U+200D U+1F466 > > > > will be displayed by capable terminals as a single glyph (again, try > > in Emacs to see), but wcswidth will probably return 6 for it. > > Note that for the use case we're talking about, the character used as current-line indicator, > it's always going to be " ", with spaces or tabs or endline surrounding it, > there is no risk of the character we choose bumping into other characters we don't control > and forming a cluster sequence. If itself is not a cluster, and is a single > (maybe wide) character, then wcwidth returns the right thing. Don't we allow the users to specify their own characters or strings for this? If not, all we have to do is verify that the characters we pick for these purposes don't have the problems and always (or at least in most cases) agree with wcwidth. > For the wcwidth returns -1 scenario, I see two choices: > > - assume width of one cell, and print marker from style anyhow, or, > > - fallback to hardcoded "*" character, like today's output. This is a subset of a more general issue with wcwidth: it cannot be relied upon to know about arbitrary characters, unless the locale's codeset is UTF-8. Because usually wcwidth only knows about characters supported by the locale's codeset. The best solution to overcome this is to have our own database of character-width values. (We actually only need the characters whose width is different from 1, i.e. zero-width and double-width characters.) That's what Emacs does. The downside is that we'd need to maintain this database as characters are added to Unicode (we could have a script that generates the database automatically from a relevant Unicode data file). Another alternative is to allow the definition of the width as part of defining the character(s) used for each indicator. Then we won't need wcwidth/wcswidth at all.