From: Pedro Alves <pedro@palves.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org
Subject: Re: [PATCH 2/2] Allow check-mark to be changed for CLI
Date: Tue, 1 Jul 2025 00:51:16 +0100 [thread overview]
Message-ID: <abd8f565-7151-443a-a6fe-0ca5feb27525@palves.net> (raw)
In-Reply-To: <8634bmd6od.fsf@gnu.org>
Hi Eli,
On 2025-06-26 13:32, Eli Zaretskii wrote:
>> Date: Thu, 26 Jun 2025 11:35:37 +0100
>> Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org
>> From: Pedro Alves <pedro@palves.net>
>>
>> As the blog explains,
>>
>> "historically terminals used wcwidth, shells, editors, and other TUI apps also use wcwidth, and continue
>> to do so today. Even though it produces the wrong values for multi-codepoint graphemes, at least the
>> wrong values are often consistently wrong across terminal emulators."
>>
>> So what can go wrong is grapheme clustering. Both the terminal and the app must agree on
>> clustering for that to work properly. We could implement that protocol in GDB today, as there's
>> a way to query if the terminal supports it. But I'm not proposing that.
>>
>> So if we ignore multi-codepoint graphemes, then wcwidth always matches what the
>> terminals do.
>
> Using wcwidth should probably work in many cases, but we still assume
> that the terminal uses a font whose glyphs' width agrees with wcwidth.
> That is not a given.
What I understood from the blog post or from reading a number of bug reports and discussions
of several different terminal projects, is that terminals lay the glyphs in a grid, and glyphs
occupy one or more slots in the grid. The font itself shouldn't really matter to compute the
number of grid slots, it's an independent decision. Of course, there may well be broken terminals
out there. But what should matter is whether the characters/emojis we pick by default work well.
If the user picks a character or emoji in their custom style that doesn't work in their broken
terminal, then well, they can just not use that particular character/emoji in their custom style.
>
>>> For example, the following string of
>>> characters:
>>>
>>> U+1F468 U+200D U+1F469 U+200D U+1F466
>>>
>>> will be displayed by capable terminals as a single glyph (again, try
>>> in Emacs to see), but wcswidth will probably return 6 for it.
>>
>> Note that for the use case we're talking about, the character used as current-line indicator,
>> it's always going to be " <marker> ", with spaces or tabs or endline surrounding it,
>> there is no risk of the character we choose bumping into other characters we don't control
>> and forming a cluster sequence. If <marker> itself is not a cluster, and is a single
>> (maybe wide) character, then wcwidth returns the right thing.
>
> Don't we allow the users to specify their own characters or strings
> for this? If not, all we have to do is verify that the characters we
> pick for these purposes don't have the problems and always (or at
> least in most cases) agree with wcwidth.
Exactly, that's what I meant. We do allow users to specify their own
characters (that's what the patch this discussion is under does), but if their terminal
is broken somehow, then it's going to be on the user to not use custom characters that
don't work on their terminal. I think that we should use wcwidth/wcswidth like every
other terminal application as explained on that blog, and document that for the characters
used in styles that may effect table alignment, cluster sequences etc. may not work properly.
We don't need to worry about width of characters used in places that are not tables, like
the error or warning emojis.
>
>> For the wcwidth returns -1 scenario, I see two choices:
>>
>> - assume width of one cell, and print marker from style anyhow, or,
>>
>> - fallback to hardcoded "*" character, like today's output.
>
> This is a subset of a more general issue with wcwidth: it cannot be
> relied upon to know about arbitrary characters, unless the locale's
> codeset is UTF-8. Because usually wcwidth only knows about characters
> supported by the locale's codeset.
I guess Windows is the main concern here, though I was reading this:
https://github.com/alf-p-steinbach/C---how-to---make-non-English-text-work-in-Windows/blob/main/how-to-use-utf8-in-windows.md
... and it looks like there's a way nowadays to make the application use the UTF-8 codepage,
via a manifest.
>
> The best solution to overcome this is to have our own database of
> character-width values. (We actually only need the characters whose
> width is different from 1, i.e. zero-width and double-width
> characters.) That's what Emacs does.> The downside is that we'd need
> to maintain this database as characters are added to Unicode (we could
> have a script that generates the database automatically from a
> relevant Unicode data file).
>
> Another alternative is to allow the definition of the width as part of
> defining the character(s) used for each indicator. Then we won't need
> wcwidth/wcswidth at all.
>
Or something in between -- have a wrapper for wcwidth that only hardcodes
the width of the few emojis GDB uses by default? I haven't looked at
the gnulib wrapper that Tromey mentioned though, maybe it already does
something like you mention for all characters.
Pedro Alves
next prev parent reply other threads:[~2025-06-30 23:51 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 20:18 [PATCH 0/2] Use check-mark for current row of CLI table Tom Tromey
2025-05-09 20:18 ` [PATCH 1/2] Introduce ui_out::field_check_mark Tom Tromey
2025-05-14 2:10 ` Kevin Buettner
2025-05-09 20:18 ` [PATCH 2/2] Allow check-mark to be changed for CLI Tom Tromey
2025-05-10 6:23 ` Eli Zaretskii
2025-05-10 6:32 ` Eli Zaretskii
2025-05-16 14:18 ` Tom Tromey
2025-05-16 16:09 ` Eli Zaretskii
2025-05-23 15:00 ` Kévin Le Gouguec
2025-05-23 15:42 ` Eli Zaretskii
2025-06-11 13:53 ` Tom Tromey
2025-05-14 15:41 ` Andrew Burgess
2025-05-16 14:20 ` Tom Tromey
2025-05-16 16:16 ` Eli Zaretskii
2025-06-25 19:11 ` Pedro Alves
2025-06-26 5:51 ` Eli Zaretskii
2025-06-26 10:35 ` Pedro Alves
2025-06-26 12:32 ` Eli Zaretskii
2025-06-30 23:51 ` Pedro Alves [this message]
2025-06-30 23:58 ` Pedro Alves
2025-05-19 12:54 ` Andrew Burgess
2025-06-20 16:22 ` Pedro Alves
2025-06-24 16:58 ` Tom Tromey
2025-06-25 10:05 ` Pedro Alves
2025-06-25 15:43 ` Tom Tromey
2025-06-25 17:21 ` Pedro Alves
2025-06-27 16:17 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abd8f565-7151-443a-a6fe-0ca5feb27525@palves.net \
--to=pedro@palves.net \
--cc=aburgess@redhat.com \
--cc=eliz@gnu.org \
--cc=gdb-patches@sourceware.org \
--cc=tromey@adacore.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox