From: Pedro Alves <pedro@palves.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org
Subject: Re: [PATCH 2/2] Allow check-mark to be changed for CLI
Date: Thu, 26 Jun 2025 11:35:37 +0100 [thread overview]
Message-ID: <2ea323f9-d606-4bc9-b4db-995f980d85c7@palves.net> (raw)
In-Reply-To: <86o6ubcapd.fsf@gnu.org>
On 2025-06-26 06:51, Eli Zaretskii wrote:
>> Date: Wed, 25 Jun 2025 20:11:07 +0100
>> Cc: aburgess@redhat.com, gdb-patches@sourceware.org
>> From: Pedro Alves <pedro@palves.net>
>>
>>>> I don't think there's any way to figure out what the display width of a
>>>> string might be. At least, not unless gdb adds a dependency on
>>>> something like libicu.
>>>
>>> I don't think even libicu will be enough, because the terminal
>>> emulators vary in this respect.
>>>
>>> There's some recent protocol to help with this, but I don't think it's
>>> supported widely enough for GDB to rely on it.
>>
>> I'm curious on what protocol is this, and what it addresses.
>
> That's "Mode 2027". For the details, see
>
> https://github.com/contour-terminal/terminal-unicode-core
> and
> https://mitchellh.com/writing/grapheme-clusters-in-terminals
>
Thank you, that's quite a good read.
As the blog explains,
"historically terminals used wcwidth, shells, editors, and other TUI apps also use wcwidth, and continue
to do so today. Even though it produces the wrong values for multi-codepoint graphemes, at least the
wrong values are often consistently wrong across terminal emulators."
So what can go wrong is grapheme clustering. Both the terminal and the app must agree on
clustering for that to work properly. We could implement that protocol in GDB today, as there's
a way to query if the terminal supports it. But I'm not proposing that.
So if we ignore multi-codepoint graphemes, then wcwidth always matches what the
terminals do.
>> If we assume monospace font in the terminal, what goes wrong?
>
> Monospace font is only relevant for single-character strings. Once
> you try to display a string that has more than one Unicode character,
> a terminal may or may not combine those characters into one or more
> font glyphs (it's a many-to-many relation, so you could have M
> characters displayed as N glyphs, where M could be lass than, equal,
> or greater than N, depending on the characters in the string and the
> capabilities of the font used by the terminal). If that happens,
> wcswidth will usually produce an incorrect result, because it doesn't
> (and cannot) know about these features of the terminal, and basically
> just sums the results of wcwidth of each character in the string.
>
>> Is there a downside to using the number of columns wcwidth says the character occupies,
>> and default to 1 if wcwidth returns -1?
>
> See above. This discussion started in the context of showing Emoji,
> where the case of combining several characters into one or more glyphs
> happens very frequently. For example, the sequence U+2713 followed by
> U+FE0F VARIATION SELECTOR-16 should display a single glyph on capable
> terminals (try it in a GUI session in Emacs to see that), although
> it's two characters. In some cases, wcswidth will do the job, because
> control characters like U+FE0F have zero width in its database, but
> that is not universally so.
OK, but we can just avoid picking emojis that are encoded with variation selector
for the defaults where the gryphs would be used in a context where alignment
matters, and document that picking such an emoji in your custom style may misbehave.
> For example, the following string of
> characters:
>
> U+1F468 U+200D U+1F469 U+200D U+1F466
>
> will be displayed by capable terminals as a single glyph (again, try
> in Emacs to see), but wcswidth will probably return 6 for it.
>
Note that for the use case we're talking about, the character used as current-line indicator,
it's always going to be " <marker> ", with spaces or tabs or endline surrounding it,
there is no risk of the character we choose bumping into other characters we don't control
and forming a cluster sequence. If <marker> itself is not a cluster, and is a single
(maybe wide) character, then wcwidth returns the right thing.
IOW, if we do use wcwidth, then it's going to be right for non-cluster multi-cell
characters, which is a large pool of reasonable emojis that we can choose from for the
use case at hand.
I'd say that we could even consider letting the user pick a multi-character sequence for the
marker, and we would wcswidth instead of wcwidth. Then, if GDB and the terminal disagree on
grapheme width, well, the downside is that the user gets their tables misaligned, but that is no
worse from what happens if we _don't_ consider wcwidth at all. But not going there, and only
allowing single character emojis, and using wcwidth, is already going to do the right thing and
be quite useful.
> And even for single-character strings wcwidth will sometimes produce
> incorrect results, because the fonts used by modern terminal emulators
> for some exotic characters (like Emoji) are not monospaced.
>
As long as we don't pick such a character as the default one, it's going to
be on the user not to pick one.
The blog post you pointed at makes me believe that we _should_ indeed
use wcwidth. If we work under the single-character constraint, then I don't
really see a downside to using wcwidth.
For the wcwidth returns -1 scenario, I see two choices:
- assume width of one cell, and print marker from style anyhow, or,
- fallback to hardcoded "*" character, like today's output.
Pedro Alves
next prev parent reply other threads:[~2025-06-26 10:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 20:18 [PATCH 0/2] Use check-mark for current row of CLI table Tom Tromey
2025-05-09 20:18 ` [PATCH 1/2] Introduce ui_out::field_check_mark Tom Tromey
2025-05-14 2:10 ` Kevin Buettner
2025-05-09 20:18 ` [PATCH 2/2] Allow check-mark to be changed for CLI Tom Tromey
2025-05-10 6:23 ` Eli Zaretskii
2025-05-10 6:32 ` Eli Zaretskii
2025-05-16 14:18 ` Tom Tromey
2025-05-16 16:09 ` Eli Zaretskii
2025-05-23 15:00 ` Kévin Le Gouguec
2025-05-23 15:42 ` Eli Zaretskii
2025-06-11 13:53 ` Tom Tromey
2025-05-14 15:41 ` Andrew Burgess
2025-05-16 14:20 ` Tom Tromey
2025-05-16 16:16 ` Eli Zaretskii
2025-06-25 19:11 ` Pedro Alves
2025-06-26 5:51 ` Eli Zaretskii
2025-06-26 10:35 ` Pedro Alves [this message]
2025-06-26 12:32 ` Eli Zaretskii
2025-06-30 23:51 ` Pedro Alves
2025-06-30 23:58 ` Pedro Alves
2025-05-19 12:54 ` Andrew Burgess
2025-06-20 16:22 ` Pedro Alves
2025-06-24 16:58 ` Tom Tromey
2025-06-25 10:05 ` Pedro Alves
2025-06-25 15:43 ` Tom Tromey
2025-06-25 17:21 ` Pedro Alves
2025-06-27 16:17 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2ea323f9-d606-4bc9-b4db-995f980d85c7@palves.net \
--to=pedro@palves.net \
--cc=aburgess@redhat.com \
--cc=eliz@gnu.org \
--cc=gdb-patches@sourceware.org \
--cc=tromey@adacore.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox