From: Eli Zaretskii <eliz@gnu.org>
To: Vladimir Prus <ghost@cs.msu.su>
Cc: gdb@sources.redhat.com
Subject: Re: printing wchar_t*
Date: Fri, 14 Apr 2006 17:18:00 -0000 [thread overview]
Message-ID: <uirpc19u8.fsf@gnu.org> (raw)
In-Reply-To: <200604141837.26618.ghost@cs.msu.su> (message from Vladimir Prus on Fri, 14 Apr 2006 18:37:25 +0400)
> From: Vladimir Prus <ghost@cs.msu.su>
> Date: Fri, 14 Apr 2006 18:37:25 +0400
> Cc: gdb@sources.redhat.com
>
> > Now, the same letter ``small a'' can be encoded in several other ways:
> > for example, its ISO-2022-7bit encoding is 0x1B 0x24 0x2C 0x31 0x28
> > 0x50, its KOI8-r encoding is 0xC1, its ISO-8859-5 encoding is 0xD0,
> > etc. It should be obvious that, of all the encodings, only the
> > fixed-length ones can be used in a wchar_t array (because wchar_t
> > arrays are stateless,
>
> I don't think this statement is backed up by anything.
>
> > This is why I said that wchar_t is not used for an encoding (such as
> > ISO-8859-5 or UTF-8 or UTF-16), but for characters' codepoints. It is
> > nowadays almost universally accepted that wchar_t is a Unicode
> > codepoint,
>
> Again, can you provide any specific pointers to support that view?
I think Robert and myself already explained that in later messages.
Feel free to ask specific questions if something is still unclear.
> I believe that on Windows:
>
> - wchar_t is 16-bit
> - wchar_t* values are supposed to be in UTF-16 encoding
> (see
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_9i79.asp
>
> Do you disagree with any of the above statements?
wchar_t is just an integer type. You can stuff _anything_ into an
integer array, but if you put UTF-16 there, each element is no longer
a character, it is one of a few 16-bit integers that encode a
character. In other words, it's a variant of multibyte strings,
except that each element is 16-bit wide.
Now, I know that Windows holds 16-bit UTF-16 encodings in wchar_t
arrays, but that is not the L"foo" strings of wide characters. In the
L"foo" notation, each of the 3 string characters _always_ occupies
exactly one wchar_t element, and L"foo"[1] is _always_ the second
character of the string. This is not true for UTF-16, as I hope is
clear from this discussion. In UTF-16, array[1] is the second 16-bit
value that encodes a character, and that character's encoding could
need more than 1 16-bit value.
> If not, then it directly
> follows that a given wchar_t is not a Unicode code point, but a code unit in
> specific representation (UTF-16), and a given code points takes either one or
> two code units, that is either one or two wchar_t. This is contrary to your
> statement that wchar_t is a single code point.
My statement was based on the assumption that you are coding for a
system where wchar_t is used for complete characters, not for UTF-16
strings. Only in that case, you can talk about ``wide characters''
and about wchar_t being a character. In UTF-16, an arbitrary element
of the array might not be a complete character.
> Anyway, this is quickly getting off-topic for gdb list, so maybe we should
> bring this somewhere else.
It _is_ on topic, IMHO, as long as we discuss features to be added to
GDB.
next prev parent reply other threads:[~2006-04-14 15:00 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-13 17:07 Vladimir Prus
2006-04-13 17:25 ` Eli Zaretskii
2006-04-14 7:29 ` Vladimir Prus
2006-04-14 8:47 ` Eli Zaretskii
2006-04-14 12:47 ` Vladimir Prus
2006-04-14 13:05 ` Eli Zaretskii
2006-04-14 13:06 ` Vladimir Prus
2006-04-14 13:15 ` Robert Dewar
2006-04-14 13:17 ` Daniel Jacobowitz
2006-04-14 13:59 ` Robert Dewar
2006-04-14 14:37 ` Eli Zaretskii
2006-04-14 14:08 ` Paul Koning
2006-04-14 14:47 ` Eli Zaretskii
2006-04-14 15:00 ` Vladimir Prus
2006-04-14 17:53 ` Eli Zaretskii
2006-04-17 7:05 ` Vladimir Prus
2006-04-17 8:35 ` Eli Zaretskii
2006-04-13 18:06 ` Jim Blandy
2006-04-13 21:18 ` Eli Zaretskii
2006-04-14 6:02 ` Jim Blandy
2006-04-14 8:43 ` Eli Zaretskii
2006-04-14 7:58 ` Vladimir Prus
2006-04-14 8:07 ` Jim Blandy
2006-04-14 8:30 ` Vladimir Prus
2006-04-14 8:57 ` Eli Zaretskii
2006-04-14 12:52 ` Vladimir Prus
2006-04-14 13:07 ` Daniel Jacobowitz
2006-04-14 14:23 ` Eli Zaretskii
2006-04-14 14:29 ` Daniel Jacobowitz
2006-04-14 14:53 ` Eli Zaretskii
2006-04-14 17:10 ` Daniel Jacobowitz
2006-04-14 17:55 ` Jim Blandy
2006-04-14 18:27 ` Eli Zaretskii
2006-04-14 18:30 ` Jim Blandy
2006-04-14 19:19 ` Eli Zaretskii
2006-04-14 14:16 ` Eli Zaretskii
2006-04-14 14:50 ` Vladimir Prus
2006-04-14 17:18 ` Eli Zaretskii [this message]
2006-04-14 18:03 ` Jim Blandy
2006-04-14 19:16 ` Eli Zaretskii
2006-04-14 19:22 ` Jim Blandy
2006-04-14 22:18 ` Daniel Jacobowitz
2006-04-16 11:39 ` Jim Blandy
2006-04-16 15:07 ` Eli Zaretskii
2006-04-15 7:14 ` Eli Zaretskii
2006-04-17 7:16 ` Vladimir Prus
2006-04-17 8:58 ` Eli Zaretskii
2006-04-17 10:35 ` Vladimir Prus
2006-04-17 12:26 ` Eli Zaretskii
2006-04-17 13:56 ` Vladimir Prus
2006-04-18 5:31 ` Eli Zaretskii
2006-04-14 19:53 ` Mark Kettenis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=uirpc19u8.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=gdb@sources.redhat.com \
--cc=ghost@cs.msu.su \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox