Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* Thread names and non-ASCII characters
@ 2019-12-19 15:17 Eli Zaretskii
  2019-12-19 17:22 ` Tom Tromey
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2019-12-19 15:17 UTC (permalink / raw)
  To: gdb

Can someone tell what GDB assumes to be the character encoding used by
thread names we get from the system APIs (such as pthread_getname_np)?
It sounds like we assume the host character set, since the functions
used to display the thread name don't perform any encoding conversion.
Is my understanding correct?

I'm asking because Windows 10 introduces a new API for setting and
getting a thread's name, but this API wants a UTF-16 encoded string,
so if we want to use it, we need to decide from/to what encoding to
convert to/from UTF-16.  The current code in windows-nat.c that
processes the special MSVC exception used on older platforms to set
thread names for debugging purposes simply copies the name as an array
of 'char', so it, too, implicitly assumes the host encoding
(a.k.a. "system codepage" in Windows parlance).

Am I missing something?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thread names and non-ASCII characters
  2019-12-19 15:17 Thread names and non-ASCII characters Eli Zaretskii
@ 2019-12-19 17:22 ` Tom Tromey
  2019-12-19 17:55   ` Paul Koning
  2019-12-19 18:59   ` Eli Zaretskii
  0 siblings, 2 replies; 6+ messages in thread
From: Tom Tromey @ 2019-12-19 17:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:

Eli> Can someone tell what GDB assumes to be the character encoding used by
Eli> thread names we get from the system APIs (such as pthread_getname_np)?
Eli> It sounds like we assume the host character set, since the functions
Eli> used to display the thread name don't perform any encoding conversion.
Eli> Is my understanding correct?

Yes, I believe so.

Eli> I'm asking because Windows 10 introduces a new API for setting and
Eli> getting a thread's name, but this API wants a UTF-16 encoded string,
Eli> so if we want to use it, we need to decide from/to what encoding to
Eli> convert to/from UTF-16.

Converting to the host charset is probably the thing to do.
If the host charset is decided incorrectly, then enhancing charset.c to
choose a better one would help in other places as well.

convert_between_encodings can be used to do the translation.

Tom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thread names and non-ASCII characters
  2019-12-19 17:22 ` Tom Tromey
@ 2019-12-19 17:55   ` Paul Koning
  2019-12-19 18:25     ` Tom Tromey
  2019-12-19 19:02     ` Eli Zaretskii
  2019-12-19 18:59   ` Eli Zaretskii
  1 sibling, 2 replies; 6+ messages in thread
From: Paul Koning @ 2019-12-19 17:55 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, gdb



> On Dec 19, 2019, at 12:22 PM, Tom Tromey <tom@tromey.com> wrote:
> 
>>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:
> 
> Eli> Can someone tell what GDB assumes to be the character encoding used by
> Eli> thread names we get from the system APIs (such as pthread_getname_np)?
> Eli> It sounds like we assume the host character set, since the functions
> Eli> used to display the thread name don't perform any encoding conversion.
> Eli> Is my understanding correct?
> 
> Yes, I believe so.
> 
> Eli> I'm asking because Windows 10 introduces a new API for setting and
> Eli> getting a thread's name, but this API wants a UTF-16 encoded string,
> Eli> so if we want to use it, we need to decide from/to what encoding to
> Eli> convert to/from UTF-16.
> 
> Converting to the host charset is probably the thing to do.

Host charset, or target charset?  I would assume target since we're talking about threads on the target.

	paul



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thread names and non-ASCII characters
  2019-12-19 17:55   ` Paul Koning
@ 2019-12-19 18:25     ` Tom Tromey
  2019-12-19 19:02     ` Eli Zaretskii
  1 sibling, 0 replies; 6+ messages in thread
From: Tom Tromey @ 2019-12-19 18:25 UTC (permalink / raw)
  To: Paul Koning; +Cc: Tom Tromey, Eli Zaretskii, gdb

>>>>> "Paul" == Paul Koning <paulkoning@comcast.net> writes:

>> Converting to the host charset is probably the thing to do.

Paul> Host charset, or target charset?  I would assume target since
Paul> we're talking about threads on the target.

In this case it sounded like the charset on the target is known to be
UTF-16, but to display in gdb it has to be converted from that to the
host charset.

For Linux, we should probably convert from the target charset to the
host charset; though this seems a little odd, in that I think the kernel
enforces a (short) length limit on thread names, and anybody using
non-ASCII risks having the name be cut off in the middle of a UTF-8
sequence.

Tom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thread names and non-ASCII characters
  2019-12-19 17:22 ` Tom Tromey
  2019-12-19 17:55   ` Paul Koning
@ 2019-12-19 18:59   ` Eli Zaretskii
  1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2019-12-19 18:59 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb

> From: Tom Tromey <tom@tromey.com>
> Cc: gdb@sourceware.org
> Date: Thu, 19 Dec 2019 10:22:21 -0700
> 
> Eli> I'm asking because Windows 10 introduces a new API for setting and
> Eli> getting a thread's name, but this API wants a UTF-16 encoded string,
> Eli> so if we want to use it, we need to decide from/to what encoding to
> Eli> convert to/from UTF-16.
> 
> Converting to the host charset is probably the thing to do.
> If the host charset is decided incorrectly, then enhancing charset.c to
> choose a better one would help in other places as well.
> 
> convert_between_encodings can be used to do the translation.

OK, thanks for the pointers.  I hope to find some time to do this
soonish.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thread names and non-ASCII characters
  2019-12-19 17:55   ` Paul Koning
  2019-12-19 18:25     ` Tom Tromey
@ 2019-12-19 19:02     ` Eli Zaretskii
  1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2019-12-19 19:02 UTC (permalink / raw)
  To: Paul Koning; +Cc: tom, gdb

> From: Paul Koning <paulkoning@comcast.net>
> Date: Thu, 19 Dec 2019 12:54:56 -0500
> Cc: Eli Zaretskii <eliz@gnu.org>, gdb@sourceware.org
> 
> > Converting to the host charset is probably the thing to do.
> 
> Host charset, or target charset?  I would assume target since we're talking about threads on the target.

If we want this to be in target charset, we need to convert also when
we set the thread name, not only when we retrieve it.  We currently do
neither, AFAICT.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-19 19:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-19 15:17 Thread names and non-ASCII characters Eli Zaretskii
2019-12-19 17:22 ` Tom Tromey
2019-12-19 17:55   ` Paul Koning
2019-12-19 18:25     ` Tom Tromey
2019-12-19 19:02     ` Eli Zaretskii
2019-12-19 18:59   ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox