From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id 70YsJh8jY2is7CYAWB0awg (envelope-from ) for ; Mon, 30 Jun 2025 19:51:59 -0400 Received: by simark.ca (Postfix, from userid 112) id 8DBEB1E11E; Mon, 30 Jun 2025 19:51:59 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-9.0 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_VALIDITY_CERTIFIED, RCVD_IN_VALIDITY_RPBL,RCVD_IN_VALIDITY_SAFE autolearn=ham autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 705FE1E0C2 for ; Mon, 30 Jun 2025 19:51:58 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F165138560AB for ; Mon, 30 Jun 2025 23:51:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F165138560AB Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) by sourceware.org (Postfix) with ESMTPS id CD1423858431 for ; Mon, 30 Jun 2025 23:51:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CD1423858431 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CD1423858431 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.221.47 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1751327485; cv=none; b=g3yotnshbWnKtkDO9V/s/Cn2ssskxo13rsrbqhb/aSlhQ01fcC1ZrJiG08/epgRxcm5jHZrzIiBHcYRjwR6pKHVIFleJ5fsGsOAvuGkWp2DRbO/hZzvv55rbHQ68cbKehxtNnCuegte5tPqXMIbgzImIgVKThMcSo4R9Q5ogiRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1751327485; c=relaxed/simple; bh=OPHo7duYF6lSz0JvzqJPceklEzfi2vvl/CCCPp5+0ZE=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=MTn8R8DGdFQAKXsIs4ipeDlKqvmEwhlHL1nvKNaCkpCiVvOMUL8eKByc6Pb+frgYvsE87Jw72ZWSyM+xdvptKSvpKfREZZC1LCnHLXR7Qf3A9GblqeaQjn1GlL0kFzTiRpL1hc2UjsvnC/ZRjPcoC3TYGFf3n44heSOraT3je48= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CD1423858431 Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-3a57ae5cb17so1708924f8f.0 for ; Mon, 30 Jun 2025 16:51:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751327483; x=1751932283; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b/B3xvdabVuVQfnlUGOrmEUxQ0xPhajJLrGhUYoGXM4=; b=CLzGBxXBjn5F28UMVM1ARPapbM/i0iAX8cf8W0HHQ98CN1+pX5M81pGhaYSvs8ZWpY ftVbsTB7DU0VJzB2sg5XhLp86GkXsBRkz0EPMTg/V+gVuA0+OaRIjWC2tR0GRXhVjTs5 YdoSyd8qhbbVvEViHt/1YIVNpU48uQXjBPh1XV4B108JPx0KD/Yq5/Wx6bjIOBexYiM9 rqykbOMkZ+U/Cee7ozOcS0so5t6WlZMa6DQVJNAIEv4U7xrWnsF2vtSGOGdzw5mp11hD u/OXz+kLiVnIoftwnG22zaaqoEGkgImP/kTjXfQyRd/G4p+hDubIpkaezTjaNgbX0APh OPSQ== X-Forwarded-Encrypted: i=1; AJvYcCXlRzxZSBvCai8Ga67RlFOyFVQf+eU+1yNo5k0L+c0PhEzfLObeqZVG7GxIPZDTHmEq98vucvvRJ5VG0A==@sourceware.org X-Gm-Message-State: AOJu0Yzuy5g48mH/u6oI9aEc8JbtNrJpLHFg32djsQylqHyO06hxFyRz 79TDgzTwLkbC7hWrlgd1kYH31B2NbkhIONskmr5NJSri58Rf8TbmsPIu X-Gm-Gg: ASbGncuLZZRoghQHrwH7hIh0MWs9G3B3riGUKNVVbPq3nAWWnArXNgddf+sbsvmJLqa /XtP6vjUdCQI6HDc7BlLjwaxk3365fYBGGTgkAhLuXnUb1oM9YTNRSoJG6MrvYXJrRRO/IaBaxI mIHBQhd7VKXwKEBnTm74CQCPASC1XpD97G9Y+Rk1FpaF9+Zhy2VvuYxYBeyGMfXcwfTJjTBBiIx YJE+CqDIwQ2cAv8oH37pyDtfg5ZYCkoW+nt+70R1nrm2g/8iB2EtVgl+Oy/bXqdtCY3Q4iRSdrm elLwHr/RKXljteKGB7+v6ef7drNseCuSKxpvqknF4QeVWaZmYI3XC6JDMJrIt7y2/Eeoiy1Ck0c 53GWcnoqr7ALwJpsiCkJA7hYWTxZRVg== X-Google-Smtp-Source: AGHT+IHUUV2zSuHOYnm8is0NU6OYC7pxdz5LgeK3vGDpCVeOmnr1NVyVdHG00nRY/nrOSq4xR4mXSw== X-Received: by 2002:adf:b356:0:b0:3a4:fbaf:749e with SMTP id ffacd0b85a97d-3a8ff61608amr8895029f8f.49.1751327483192; Mon, 30 Jun 2025 16:51:23 -0700 (PDT) Received: from ?IPV6:2001:8a0:4284:9b01:fb34:f54e:9a51:ed03? ([2001:8a0:4284:9b01:fb34:f54e:9a51:ed03]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a892e5f8a0sm11762247f8f.96.2025.06.30.16.51.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jun 2025 16:51:21 -0700 (PDT) Message-ID: Date: Tue, 1 Jul 2025 00:51:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] Allow check-mark to be changed for CLI To: Eli Zaretskii Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org References: <20250509-emoji-check-mark-v1-0-63b6c52411f3@adacore.com> <20250509-emoji-check-mark-v1-2-63b6c52411f3@adacore.com> <87zfffnqrj.fsf@redhat.com> <87wmag63jg.fsf@tromey.com> <86ikm0y1hi.fsf@gnu.org> <86o6ubcapd.fsf@gnu.org> <2ea323f9-d606-4bc9-b4db-995f980d85c7@palves.net> <8634bmd6od.fsf@gnu.org> From: Pedro Alves Content-Language: en-US In-Reply-To: <8634bmd6od.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~public-inbox=simark.ca@sourceware.org Hi Eli, On 2025-06-26 13:32, Eli Zaretskii wrote: >> Date: Thu, 26 Jun 2025 11:35:37 +0100 >> Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org >> From: Pedro Alves >> >> As the blog explains, >> >> "historically terminals used wcwidth, shells, editors, and other TUI apps also use wcwidth, and continue >> to do so today. Even though it produces the wrong values for multi-codepoint graphemes, at least the >> wrong values are often consistently wrong across terminal emulators." >> >> So what can go wrong is grapheme clustering. Both the terminal and the app must agree on >> clustering for that to work properly. We could implement that protocol in GDB today, as there's >> a way to query if the terminal supports it. But I'm not proposing that. >> >> So if we ignore multi-codepoint graphemes, then wcwidth always matches what the >> terminals do. > > Using wcwidth should probably work in many cases, but we still assume > that the terminal uses a font whose glyphs' width agrees with wcwidth. > That is not a given. What I understood from the blog post or from reading a number of bug reports and discussions of several different terminal projects, is that terminals lay the glyphs in a grid, and glyphs occupy one or more slots in the grid. The font itself shouldn't really matter to compute the number of grid slots, it's an independent decision. Of course, there may well be broken terminals out there. But what should matter is whether the characters/emojis we pick by default work well. If the user picks a character or emoji in their custom style that doesn't work in their broken terminal, then well, they can just not use that particular character/emoji in their custom style. > >>> For example, the following string of >>> characters: >>> >>> U+1F468 U+200D U+1F469 U+200D U+1F466 >>> >>> will be displayed by capable terminals as a single glyph (again, try >>> in Emacs to see), but wcswidth will probably return 6 for it. >> >> Note that for the use case we're talking about, the character used as current-line indicator, >> it's always going to be " ", with spaces or tabs or endline surrounding it, >> there is no risk of the character we choose bumping into other characters we don't control >> and forming a cluster sequence. If itself is not a cluster, and is a single >> (maybe wide) character, then wcwidth returns the right thing. > > Don't we allow the users to specify their own characters or strings > for this? If not, all we have to do is verify that the characters we > pick for these purposes don't have the problems and always (or at > least in most cases) agree with wcwidth. Exactly, that's what I meant. We do allow users to specify their own characters (that's what the patch this discussion is under does), but if their terminal is broken somehow, then it's going to be on the user to not use custom characters that don't work on their terminal. I think that we should use wcwidth/wcswidth like every other terminal application as explained on that blog, and document that for the characters used in styles that may effect table alignment, cluster sequences etc. may not work properly. We don't need to worry about width of characters used in places that are not tables, like the error or warning emojis. > >> For the wcwidth returns -1 scenario, I see two choices: >> >> - assume width of one cell, and print marker from style anyhow, or, >> >> - fallback to hardcoded "*" character, like today's output. > > This is a subset of a more general issue with wcwidth: it cannot be > relied upon to know about arbitrary characters, unless the locale's > codeset is UTF-8. Because usually wcwidth only knows about characters > supported by the locale's codeset. I guess Windows is the main concern here, though I was reading this: https://github.com/alf-p-steinbach/C---how-to---make-non-English-text-work-in-Windows/blob/main/how-to-use-utf8-in-windows.md ... and it looks like there's a way nowadays to make the application use the UTF-8 codepage, via a manifest. > > The best solution to overcome this is to have our own database of > character-width values. (We actually only need the characters whose > width is different from 1, i.e. zero-width and double-width > characters.) That's what Emacs does.> The downside is that we'd need > to maintain this database as characters are added to Unicode (we could > have a script that generates the database automatically from a > relevant Unicode data file). > > Another alternative is to allow the definition of the width as part of > defining the character(s) used for each indicator. Then we won't need > wcwidth/wcswidth at all. > Or something in between -- have a wrapper for wcwidth that only hardcodes the width of the few emojis GDB uses by default? I haven't looked at the gnulib wrapper that Tromey mentioned though, maybe it already does something like you mention for all characters. Pedro Alves