Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Pedro Alves <pedro@palves.net>
Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org
Subject: Re: [PATCH 2/2] Allow check-mark to be changed for CLI
Date: Thu, 26 Jun 2025 15:32:50 +0300	[thread overview]
Message-ID: <8634bmd6od.fsf@gnu.org> (raw)
In-Reply-To: <2ea323f9-d606-4bc9-b4db-995f980d85c7@palves.net> (message from Pedro Alves on Thu, 26 Jun 2025 11:35:37 +0100)

> Date: Thu, 26 Jun 2025 11:35:37 +0100
> Cc: tromey@adacore.com, aburgess@redhat.com, gdb-patches@sourceware.org
> From: Pedro Alves <pedro@palves.net>
> 
> As the blog explains, 
> 
>  "historically terminals used wcwidth, shells, editors, and other TUI apps also use wcwidth, and continue
>   to do so today. Even though it produces the wrong values for multi-codepoint graphemes, at least the
>   wrong values are often consistently wrong across terminal emulators."
> 
> So what can go wrong is grapheme clustering.  Both the terminal and the app must agree on
> clustering for that to work properly.  We could implement that protocol in GDB today, as there's
> a way to query if the terminal supports it.  But I'm not proposing that.
> 
> So if we ignore multi-codepoint graphemes, then wcwidth always matches what the
> terminals do.

Using wcwidth should probably work in many cases, but we still assume
that the terminal uses a font whose glyphs' width agrees with wcwidth.
That is not a given.

> > For example, the following string of
> > characters:
> > 
> >   U+1F468 U+200D U+1F469 U+200D U+1F466
> > 
> > will be displayed by capable terminals as a single glyph (again, try
> > in Emacs to see), but wcswidth will probably return 6 for it.
> 
> Note that for the use case we're talking about, the character used as current-line indicator,
> it's always going to be " <marker> ", with spaces or tabs or endline surrounding it,
> there is no risk of the character we choose bumping into other characters we don't control
> and forming a cluster sequence.  If <marker> itself is not a cluster, and is a single
> (maybe wide) character, then wcwidth returns the right thing.

Don't we allow the users to specify their own characters or strings
for this?  If not, all we have to do is verify that the characters we
pick for these purposes don't have the problems and always (or at
least in most cases) agree with wcwidth.

> For the wcwidth returns -1 scenario, I see two choices: 
> 
>  - assume width of one cell, and print marker from style anyhow, or,
> 
>  - fallback to hardcoded "*" character, like today's output.

This is a subset of a more general issue with wcwidth: it cannot be
relied upon to know about arbitrary characters, unless the locale's
codeset is UTF-8.  Because usually wcwidth only knows about characters
supported by the locale's codeset.

The best solution to overcome this is to have our own database of
character-width values.  (We actually only need the characters whose
width is different from 1, i.e. zero-width and double-width
characters.)  That's what Emacs does.  The downside is that we'd need
to maintain this database as characters are added to Unicode (we could
have a script that generates the database automatically from a
relevant Unicode data file).

Another alternative is to allow the definition of the width as part of
defining the character(s) used for each indicator.  Then we won't need
wcwidth/wcswidth at all.

  reply	other threads:[~2025-06-26 12:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-09 20:18 [PATCH 0/2] Use check-mark for current row of CLI table Tom Tromey
2025-05-09 20:18 ` [PATCH 1/2] Introduce ui_out::field_check_mark Tom Tromey
2025-05-14  2:10   ` Kevin Buettner
2025-05-09 20:18 ` [PATCH 2/2] Allow check-mark to be changed for CLI Tom Tromey
2025-05-10  6:23   ` Eli Zaretskii
2025-05-10  6:32   ` Eli Zaretskii
2025-05-16 14:18     ` Tom Tromey
2025-05-16 16:09       ` Eli Zaretskii
2025-05-23 15:00         ` Kévin Le Gouguec
2025-05-23 15:42           ` Eli Zaretskii
2025-06-11 13:53           ` Tom Tromey
2025-05-14 15:41   ` Andrew Burgess
2025-05-16 14:20     ` Tom Tromey
2025-05-16 16:16       ` Eli Zaretskii
2025-06-25 19:11         ` Pedro Alves
2025-06-26  5:51           ` Eli Zaretskii
2025-06-26 10:35             ` Pedro Alves
2025-06-26 12:32               ` Eli Zaretskii [this message]
2025-06-30 23:51                 ` Pedro Alves
2025-06-30 23:58                   ` Pedro Alves
2025-05-19 12:54       ` Andrew Burgess
2025-06-20 16:22   ` Pedro Alves
2025-06-24 16:58     ` Tom Tromey
2025-06-25 10:05       ` Pedro Alves
2025-06-25 15:43         ` Tom Tromey
2025-06-25 17:21           ` Pedro Alves
2025-06-27 16:17             ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8634bmd6od.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    --cc=tromey@adacore.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox