[PATCH v3] [gdb/tui] Handle unicode chars in prompt

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [PATCH v3] [gdb/tui] Handle unicode chars in prompt
@ 2025-07-06  7:40 Tom de Vries
  2025-07-06  7:47 ` Tom de Vries
  0 siblings, 1 reply; 3+ messages in thread
From: Tom de Vries @ 2025-07-06  7:40 UTC (permalink / raw)
  To: gdb-patches

Let's try to set the prompt using a unicode character, say '❯', aka U+276F
(heavy right-pointing angle quotation mark ornament).

This works fine on an xterm with CLI (with X marking the position of the
blinking cursor):
...
$ gdb -q -ex "set prompt GDB❯ "
GDB❯ X
...
but with TUI:
...
$ gdb -q -tui -ex "set prompt GDB❯ "
...
we get instead:
...
GDB  GDB  X
...

We can use the test-case gdb.tui/unicode-prompt.exp to get more details, using
tuiterm.

With Term::dump_screen we have:
...
   16 (gdb) set prompt GDB❯
   17 GDB❯ GDB❯ GDB❯ set prompt (gdb)
   18 (gdb)
...
and with Term::dump_screen_with_attrs (summarizing using attribute sets <attrs1>
and <attrs2>):
...
   16 (gdb) set prompt GDB❯
   17 GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> set prompt (gdb)
   18 (gdb)
...
where:
...
<attrs1> == <reverse:1><invisible:1><blinking:1><intensity:bold>
<attrs2> == <reverse:0><invisible:0><blinking:0><intensity:normal>
...

This explains why we didn't see the unicode char on xterm: it's hidden
because the invisible attribute is set.

So, there seem to be two problems:
- the attributes are incorrect, and
- the prompt is repeated a couple of times.

In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
at a time using waddch, which apparently breaks multi-byte char support.

Fix this by detecting multi-byte chars in tui_puts_internal, and printing them using
waddnstr.

Tested on x86_64-linux and x86_64-freebsd.

Reported-By: wuzy01@qq.com

PR tui/28800
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28800
---
 gdb/charset.c                            |  20 +++++
 gdb/charset.h                            |   7 ++
 gdb/testsuite/gdb.tui/unicode-prompt.exp |  71 ++++++++++++++++
 gdb/tui/tui-io.c                         | 100 +++++++++++++++++++----
 4 files changed, 183 insertions(+), 15 deletions(-)
 create mode 100644 gdb/testsuite/gdb.tui/unicode-prompt.exp

diff --git a/gdb/charset.c b/gdb/charset.c
index 259362563b2..9c0df83c0d3 100644
--- a/gdb/charset.c
+++ b/gdb/charset.c
@@ -690,6 +690,26 @@ wchar_iterator::iterate (enum wchar_iterate_result *out_result,
   return -1;
 }
 
+/* See charset.h.  */
+
+void
+wchar_iterator::skip (size_t len)
+{
+  m_input += len;
+
+  gdb_assert (len <= m_bytes);
+  m_bytes -= len;
+}
+
+/* See charset.h.  */
+
+void
+wchar_iterator::reset (const gdb_byte *input, size_t bytes)
+{
+  m_input = input;
+  m_bytes = bytes;
+}
+
 struct charset_vector
 {
   ~charset_vector ()
diff --git a/gdb/charset.h b/gdb/charset.h
index a0f109da5ee..4d68e61f35a 100644
--- a/gdb/charset.h
+++ b/gdb/charset.h
@@ -126,6 +126,13 @@ class wchar_iterator
   int iterate (enum wchar_iterate_result *out_result, gdb_wchar_t **out_chars,
 	       const gdb_byte **ptr, size_t *len);
 
+  /* Increase the input buffer pointer by LEN bytes.  */
+  void skip (size_t len);
+
+  /* Reset the input buffer pointer to INPUT and the number of bytes in the
+     input buffer to BYTES.  */
+  void reset (const gdb_byte *input, size_t bytes);
+
  private:
 
   /* The underlying iconv descriptor.  */
diff --git a/gdb/testsuite/gdb.tui/unicode-prompt.exp b/gdb/testsuite/gdb.tui/unicode-prompt.exp
new file mode 100644
index 00000000000..ac2b6202c04
--- /dev/null
+++ b/gdb/testsuite/gdb.tui/unicode-prompt.exp
@@ -0,0 +1,71 @@
+# Copyright 2025 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+require allow_tui_tests
+
+tuiterm_env
+
+save_vars { env(LC_ALL) } {
+    # Override "C" settings from default_gdb_init.
+    setenv LC_ALL "C.UTF-8"
+
+    Term::clean_restart 24 80
+}
+
+if {![Term::enter_tui]} {
+    unsupported "TUI not supported"
+    return
+}
+
+set unicode_char "\u276F"
+set unicode_char_unsupported "❯"
+
+set color_on [string cat {\033} "\[" "31m"]
+set color_off [string cat {\033} "\[" "0m"]
+
+set prompt "GDB$color_on$unicode_char$color_off "
+set prompt_no_color "GDB$unicode_char "
+set prompt_no_color_re [string_to_regexp $prompt_no_color]
+
+if { [ishost *-*-*bsd*] } {
+    set issue_eol "\r\n"
+} else {
+    set issue_eol "\n"
+}
+
+# Set new prompt.
+send_gdb "set prompt $prompt$issue_eol"
+# Set old prompt back.
+send_gdb "set prompt (gdb) $issue_eol"
+
+gdb_assert \
+    { [Term::wait_for "^${prompt_no_color_re}set prompt $gdb_prompt "] } \
+    "prompt with unicode char"
+
+set line [Term::get_line_with_attrs [expr $Term::_cur_row - 1]]
+verbose -log "line with attrs: '$line'"
+
+set prompt_with_attrs_re "GDB<fg:red>$unicode_char<fg:default> "
+set prompt_unsupported_re "GDB$unicode_char_unsupported "
+
+set test "colored unicode char"
+if { [regexp "^${prompt_with_attrs_re}set prompt .*$" $line] } {
+    pass $test
+} elseif { [regexp "^${prompt_unsupported_re}set prompt .*$" $line] } {
+    # I get this on freebsd: no color, and unicode char not recognized.
+    unsupported $test
+} else {
+    fail $test
+}
diff --git a/gdb/tui/tui-io.c b/gdb/tui/tui-io.c
index 1b4cc82cce8..5c39e42362a 100644
--- a/gdb/tui/tui-io.c
+++ b/gdb/tui/tui-io.c
@@ -45,6 +45,7 @@
 #include "gdbsupport/unordered_map.h"
 #include "pager.h"
 #include "gdbsupport/gdb-checked-static-cast.h"
+#include "charset.h"
 
 /* This redefines CTRL if it is not already defined, so it must come
    after terminal state related include files like <term.h> and
@@ -539,30 +540,99 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
   char c;
   int prev_col = 0;
   bool saw_nl = false;
+  size_t skip = 0;
+  wchar_iterator it ((gdb_byte *)string, strlen (string), host_charset (), 1);
 
-  while ((c = *string++) != 0)
+  while (true)
     {
-      if (c == '\1' || c == '\2')
-	{
-	  /* Ignore these, they are readline escape-marking
-	     sequences.  */
-	  continue;
-	}
+      bool handled = false;
+
+      /* Get iterator in sync with string.  */
+      it.skip (skip);
+      skip = 0;
+
+      /* Detect and handle multibyte chars.  */
+      {
+	enum wchar_iterate_result res2;
+	gdb_wchar_t *dummy1;
+	const gdb_byte *dummy2;
+	size_t len;
+	int res = it.iterate (&res2, &dummy1, &dummy2, &len);
+	if (res < 0)
+	  {
+	    /* End of string.  */
+	    gdb_assert (res2 == wchar_iterate_eof);
+	    break;
+	  }
+
+	if (res == 0)
+	  {
+	    if (res2 == wchar_iterate_invalid)
+	      {
+		/* Let single-byte char code handle it.  */
+		gdb_assert (len == 1);
+	      }
+	    else if (res2 == wchar_iterate_incomplete)
+	      {
+		/* Iterator has been setup to return end-of-string on next
+		   call to iterate.  Make that an advance-by-one instead, and
+		   let single-byte char code handle it.  */
+		it.reset ((gdb_byte *)(string + 1), strlen (string + 1));
+	      }
+	    else
+	      gdb_assert_not_reached ("");
+	  }
+	else
+	  {
+	    /* res > 0.  */
+	    gdb_assert (res2 == wchar_iterate_ok);
+	    if (len > 1)
+	      {
+		/* Multi-byte char.  Handle it.  */
+		waddnstr (w, string, len);
+		string += len;
+		handled = true;
+	      }
+	    else
+	      {
+		/* Single-byte char.  Let single-byte char code handle it.  */
+		gdb_assert (len == 1);
+	      }
+	  }
+      }
 
-      if (c == '\033')
+      if (!handled)
 	{
-	  size_t bytes_read = apply_ansi_escape (w, string - 1);
-	  if (bytes_read > 0)
+	  c = *string++;
+	  if (c == '\0')
+	    {
+	      /* End of string.  */
+	      break;
+	    }
+
+	  if (c == '\1' || c == '\2')
 	    {
-	      string = string + bytes_read - 1;
+	      /* Ignore these, they are readline escape-marking
+		 sequences.  */
 	      continue;
 	    }
-	}
 
-      if (c == '\n')
-	saw_nl = true;
+	  if (c == '\033')
+	    {
+	      size_t bytes_read = apply_ansi_escape (w, string - 1);
+	      if (bytes_read > 0)
+		{
+		  skip = bytes_read - 1;
+		  string += skip;
+		  continue;
+		}
+	    }
+
+	  if (c == '\n')
+	    saw_nl = true;
 
-      do_tui_putc (w, c);
+	  do_tui_putc (w, c);
+	}
 
       if (height != nullptr)
 	{

base-commit: 87f5e2edca1412326ae40489e2780821093481cb
-- 
2.43.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] [gdb/tui] Handle unicode chars in prompt
  2025-07-06  7:40 [PATCH v3] [gdb/tui] Handle unicode chars in prompt Tom de Vries
@ 2025-07-06  7:47 ` Tom de Vries
  2025-07-18 12:35   ` Tom de Vries
  0 siblings, 1 reply; 3+ messages in thread
From: Tom de Vries @ 2025-07-06  7:47 UTC (permalink / raw)
  To: gdb-patches

On 7/6/25 09:40, Tom de Vries wrote:
> Let's try to set the prompt using a unicode character, say '❯', aka U+276F
> (heavy right-pointing angle quotation mark ornament).
> 
> This works fine on an xterm with CLI (with X marking the position of the
> blinking cursor):
> ...
> $ gdb -q -ex "set prompt GDB❯ "
> GDB❯ X
> ...
> but with TUI:
> ...
> $ gdb -q -tui -ex "set prompt GDB❯ "
> ...
> we get instead:
> ...
> GDB  GDB  X
> ...
> 
> We can use the test-case gdb.tui/unicode-prompt.exp to get more details, using
> tuiterm.
> 
> With Term::dump_screen we have:
> ...
>     16 (gdb) set prompt GDB❯
>     17 GDB❯ GDB❯ GDB❯ set prompt (gdb)
>     18 (gdb)
> ...
> and with Term::dump_screen_with_attrs (summarizing using attribute sets <attrs1>
> and <attrs2>):
> ...
>     16 (gdb) set prompt GDB❯
>     17 GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> set prompt (gdb)
>     18 (gdb)
> ...
> where:
> ...
> <attrs1> == <reverse:1><invisible:1><blinking:1><intensity:bold>
> <attrs2> == <reverse:0><invisible:0><blinking:0><intensity:normal>
> ...
> 
> This explains why we didn't see the unicode char on xterm: it's hidden
> because the invisible attribute is set.
> 
> So, there seem to be two problems:
> - the attributes are incorrect, and
> - the prompt is repeated a couple of times.
> 
> In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
> at a time using waddch, which apparently breaks multi-byte char support.
> 
> Fix this by detecting multi-byte chars in tui_puts_internal, and printing them using
> waddnstr.
> 

This v3 is roughly the same as what I've posted here ( 
https://sourceware.org/pipermail/gdb-patches/2023-June/200296.html ).

Changes compared to that post:
- rebased on current trunk
- updated test-case for freebsd
- updated test-case copyright year to 2025
- fixed formatting issue in tui-io.c

Thanks,
- Tom

> Tested on x86_64-linux and x86_64-freebsd.
> 
> Reported-By: wuzy01@qq.com
> 
> PR tui/28800
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28800
> ---
>   gdb/charset.c                            |  20 +++++
>   gdb/charset.h                            |   7 ++
>   gdb/testsuite/gdb.tui/unicode-prompt.exp |  71 ++++++++++++++++
>   gdb/tui/tui-io.c                         | 100 +++++++++++++++++++----
>   4 files changed, 183 insertions(+), 15 deletions(-)
>   create mode 100644 gdb/testsuite/gdb.tui/unicode-prompt.exp
> 
> diff --git a/gdb/charset.c b/gdb/charset.c
> index 259362563b2..9c0df83c0d3 100644
> --- a/gdb/charset.c
> +++ b/gdb/charset.c
> @@ -690,6 +690,26 @@ wchar_iterator::iterate (enum wchar_iterate_result *out_result,
>     return -1;
>   }
>   
> +/* See charset.h.  */
> +
> +void
> +wchar_iterator::skip (size_t len)
> +{
> +  m_input += len;
> +
> +  gdb_assert (len <= m_bytes);
> +  m_bytes -= len;
> +}
> +
> +/* See charset.h.  */
> +
> +void
> +wchar_iterator::reset (const gdb_byte *input, size_t bytes)
> +{
> +  m_input = input;
> +  m_bytes = bytes;
> +}
> +
>   struct charset_vector
>   {
>     ~charset_vector ()
> diff --git a/gdb/charset.h b/gdb/charset.h
> index a0f109da5ee..4d68e61f35a 100644
> --- a/gdb/charset.h
> +++ b/gdb/charset.h
> @@ -126,6 +126,13 @@ class wchar_iterator
>     int iterate (enum wchar_iterate_result *out_result, gdb_wchar_t **out_chars,
>   	       const gdb_byte **ptr, size_t *len);
>   
> +  /* Increase the input buffer pointer by LEN bytes.  */
> +  void skip (size_t len);
> +
> +  /* Reset the input buffer pointer to INPUT and the number of bytes in the
> +     input buffer to BYTES.  */
> +  void reset (const gdb_byte *input, size_t bytes);
> +
>    private:
>   
>     /* The underlying iconv descriptor.  */
> diff --git a/gdb/testsuite/gdb.tui/unicode-prompt.exp b/gdb/testsuite/gdb.tui/unicode-prompt.exp
> new file mode 100644
> index 00000000000..ac2b6202c04
> --- /dev/null
> +++ b/gdb/testsuite/gdb.tui/unicode-prompt.exp
> @@ -0,0 +1,71 @@
> +# Copyright 2025 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +require allow_tui_tests
> +
> +tuiterm_env
> +
> +save_vars { env(LC_ALL) } {
> +    # Override "C" settings from default_gdb_init.
> +    setenv LC_ALL "C.UTF-8"
> +
> +    Term::clean_restart 24 80
> +}
> +
> +if {![Term::enter_tui]} {
> +    unsupported "TUI not supported"
> +    return
> +}
> +
> +set unicode_char "\u276F"
> +set unicode_char_unsupported "❯"
> +
> +set color_on [string cat {\033} "\[" "31m"]
> +set color_off [string cat {\033} "\[" "0m"]
> +
> +set prompt "GDB$color_on$unicode_char$color_off "
> +set prompt_no_color "GDB$unicode_char "
> +set prompt_no_color_re [string_to_regexp $prompt_no_color]
> +
> +if { [ishost *-*-*bsd*] } {
> +    set issue_eol "\r\n"
> +} else {
> +    set issue_eol "\n"
> +}
> +
> +# Set new prompt.
> +send_gdb "set prompt $prompt$issue_eol"
> +# Set old prompt back.
> +send_gdb "set prompt (gdb) $issue_eol"
> +
> +gdb_assert \
> +    { [Term::wait_for "^${prompt_no_color_re}set prompt $gdb_prompt "] } \
> +    "prompt with unicode char"
> +
> +set line [Term::get_line_with_attrs [expr $Term::_cur_row - 1]]
> +verbose -log "line with attrs: '$line'"
> +
> +set prompt_with_attrs_re "GDB<fg:red>$unicode_char<fg:default> "
> +set prompt_unsupported_re "GDB$unicode_char_unsupported "
> +
> +set test "colored unicode char"
> +if { [regexp "^${prompt_with_attrs_re}set prompt .*$" $line] } {
> +    pass $test
> +} elseif { [regexp "^${prompt_unsupported_re}set prompt .*$" $line] } {
> +    # I get this on freebsd: no color, and unicode char not recognized.
> +    unsupported $test
> +} else {
> +    fail $test
> +}
> diff --git a/gdb/tui/tui-io.c b/gdb/tui/tui-io.c
> index 1b4cc82cce8..5c39e42362a 100644
> --- a/gdb/tui/tui-io.c
> +++ b/gdb/tui/tui-io.c
> @@ -45,6 +45,7 @@
>   #include "gdbsupport/unordered_map.h"
>   #include "pager.h"
>   #include "gdbsupport/gdb-checked-static-cast.h"
> +#include "charset.h"
>   
>   /* This redefines CTRL if it is not already defined, so it must come
>      after terminal state related include files like <term.h> and
> @@ -539,30 +540,99 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
>     char c;
>     int prev_col = 0;
>     bool saw_nl = false;
> +  size_t skip = 0;
> +  wchar_iterator it ((gdb_byte *)string, strlen (string), host_charset (), 1);
>   
> -  while ((c = *string++) != 0)
> +  while (true)
>       {
> -      if (c == '\1' || c == '\2')
> -	{
> -	  /* Ignore these, they are readline escape-marking
> -	     sequences.  */
> -	  continue;
> -	}
> +      bool handled = false;
> +
> +      /* Get iterator in sync with string.  */
> +      it.skip (skip);
> +      skip = 0;
> +
> +      /* Detect and handle multibyte chars.  */
> +      {
> +	enum wchar_iterate_result res2;
> +	gdb_wchar_t *dummy1;
> +	const gdb_byte *dummy2;
> +	size_t len;
> +	int res = it.iterate (&res2, &dummy1, &dummy2, &len);
> +	if (res < 0)
> +	  {
> +	    /* End of string.  */
> +	    gdb_assert (res2 == wchar_iterate_eof);
> +	    break;
> +	  }
> +
> +	if (res == 0)
> +	  {
> +	    if (res2 == wchar_iterate_invalid)
> +	      {
> +		/* Let single-byte char code handle it.  */
> +		gdb_assert (len == 1);
> +	      }
> +	    else if (res2 == wchar_iterate_incomplete)
> +	      {
> +		/* Iterator has been setup to return end-of-string on next
> +		   call to iterate.  Make that an advance-by-one instead, and
> +		   let single-byte char code handle it.  */
> +		it.reset ((gdb_byte *)(string + 1), strlen (string + 1));
> +	      }
> +	    else
> +	      gdb_assert_not_reached ("");
> +	  }
> +	else
> +	  {
> +	    /* res > 0.  */
> +	    gdb_assert (res2 == wchar_iterate_ok);
> +	    if (len > 1)
> +	      {
> +		/* Multi-byte char.  Handle it.  */
> +		waddnstr (w, string, len);
> +		string += len;
> +		handled = true;
> +	      }
> +	    else
> +	      {
> +		/* Single-byte char.  Let single-byte char code handle it.  */
> +		gdb_assert (len == 1);
> +	      }
> +	  }
> +      }
>   
> -      if (c == '\033')
> +      if (!handled)
>   	{
> -	  size_t bytes_read = apply_ansi_escape (w, string - 1);
> -	  if (bytes_read > 0)
> +	  c = *string++;
> +	  if (c == '\0')
> +	    {
> +	      /* End of string.  */
> +	      break;
> +	    }
> +
> +	  if (c == '\1' || c == '\2')
>   	    {
> -	      string = string + bytes_read - 1;
> +	      /* Ignore these, they are readline escape-marking
> +		 sequences.  */
>   	      continue;
>   	    }
> -	}
>   
> -      if (c == '\n')
> -	saw_nl = true;
> +	  if (c == '\033')
> +	    {
> +	      size_t bytes_read = apply_ansi_escape (w, string - 1);
> +	      if (bytes_read > 0)
> +		{
> +		  skip = bytes_read - 1;
> +		  string += skip;
> +		  continue;
> +		}
> +	    }
> +
> +	  if (c == '\n')
> +	    saw_nl = true;
>   
> -      do_tui_putc (w, c);
> +	  do_tui_putc (w, c);
> +	}
>   
>         if (height != nullptr)
>   	{
> 
> base-commit: 87f5e2edca1412326ae40489e2780821093481cb


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] [gdb/tui] Handle unicode chars in prompt
  2025-07-06  7:47 ` Tom de Vries
@ 2025-07-18 12:35   ` Tom de Vries
  0 siblings, 0 replies; 3+ messages in thread
From: Tom de Vries @ 2025-07-18 12:35 UTC (permalink / raw)
  To: gdb-patches

On 7/6/25 09:47, Tom de Vries wrote:
> 
> This v3 is roughly the same as what I've posted here ( https:// 
> sourceware.org/pipermail/gdb-patches/2023-June/200296.html ).
> 
> Changes compared to that post:
> - rebased on current trunk
> - updated test-case for freebsd
> - updated test-case copyright year to 2025
> - fixed formatting issue in tui-io.c

I've submitted a v4.

Changes:
- dropped freebsd-specific changes.  I'm about to submit a series that
   make those changes superfluous.
- adding missing "require {have_host_locale C.UTF-8}"

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-18 12:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-06  7:40 [PATCH v3] [gdb/tui] Handle unicode chars in prompt Tom de Vries
2025-07-06  7:47 ` Tom de Vries
2025-07-18 12:35   ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox