Mirror of the gdb mailing list
 help / color / mirror / Atom feed
From: "Jim Blandy" <jimb@red-bean.com>
To: "Vladimir Prus" <ghost@cs.msu.su>
Cc: gdb@sources.redhat.com
Subject: Re: printing wchar_t*
Date: Fri, 14 Apr 2006 08:07:00 -0000	[thread overview]
Message-ID: <8f2776cb0604140029r44decd6atfa728aad53cb596d@mail.gmail.com> (raw)
In-Reply-To: <e1necb$gen$1@sea.gmane.org>

On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote:
> Jim Blandy wrote:
>
> > On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote:
> >> I have a user-defined command that can produce the output I want, but is
> >> defining a custom command the right approach?
> >
> > Well, you'd like wide strings to be printed properly when they appear
> > in structures, as arguments to functions, and so on, right?  So a
> > user-defined command isn't ideal.
>
> I think I'll still need to do some processing for wchar_t* on frontend side.
> The problem is that I don't see any way how gdb can print wchar_t in a way
> that does not require post-processing. It can print it as UTF8, but then
> for printing char* gdb should use local 8 bit encoding, which is likely to
> be *not* UTF8. Gdb can probably use some extra markers for values: like:
>
>    "foo"  for string in local 8-bit encoding
>    L"foo" for string in UTF8 encoding.
>
> It's also possible to use "\u" escapes.
>
> But then there's a problem:
>
>    - Do we assume that wchar_t is always UTF-16 or UTF-32?
>    - If not:
>      - how user can select this?
>      - how user-specified encoding will be handled

You can't hard-code assumptions about the character set into GDB.  Nor
can you hard-code the assumption that the host and target character
sets are the same.  GDB needs to do explicit conversions between the
two as needed, and handle mismatches in some reasonable way.

GDB already has the commands 'set host-charset' and 'set
target-charset', so you can assume that you have accurate information
about the character sets at hand.  They fall back to ASCII.

> > The best approach would be to extend charset.[ch] to handle wide
> > character sets as well, and then add code to the language-specific
> > printing routines to use the charset functions.  (This is fortunately
> > much simpler than adding support for multibyte characters.)
>
> For, for each wchar_t element language-specific code will call
> 'target_wchar_t_to_host', that will output specific representation of that
> wchar_t. Hmm, the interface there seem to assume theres 1<->1 mapping
> between target and host characters.  This makes L"UTF8" format and ascii
> string with \u escapes format impossible, It seems.

Not at all.  The current character and string printing code uses those
routines, and it handles unprintable and invalid characters just fine.
 See, for example, host_print_char_literally, and
c_target_char_has_backslash_escape.

GDB tries to print characters and strings as they would appear in
source code.  C doesn't assume that the source and execution character
sets are the same; by using numeric escapes, you can write programs
for any execution character set in any source character set.  You just
need enough information to manage the overlap.

As far as 1-to-1 mappings are concerned, the only necessary property
is that host_char_to_target and target_char_to_host be inverses, and
return zero for characters that can't make a round trip.  The existing
string-printing code will automatically use numeric escapes for
characters that target_char_to_host won't translate.


  reply	other threads:[~2006-04-14  7:29 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-13 17:07 Vladimir Prus
2006-04-13 17:25 ` Eli Zaretskii
2006-04-14  7:29   ` Vladimir Prus
2006-04-14  8:47     ` Eli Zaretskii
2006-04-14 12:47       ` Vladimir Prus
2006-04-14 13:05         ` Eli Zaretskii
2006-04-14 13:06           ` Vladimir Prus
2006-04-14 13:15             ` Robert Dewar
2006-04-14 13:17           ` Daniel Jacobowitz
2006-04-14 13:59             ` Robert Dewar
2006-04-14 14:37             ` Eli Zaretskii
2006-04-14 14:08       ` Paul Koning
2006-04-14 14:47         ` Eli Zaretskii
2006-04-14 15:00           ` Vladimir Prus
2006-04-14 17:53             ` Eli Zaretskii
2006-04-17  7:05               ` Vladimir Prus
2006-04-17  8:35                 ` Eli Zaretskii
2006-04-13 18:06 ` Jim Blandy
2006-04-13 21:18   ` Eli Zaretskii
2006-04-14  6:02     ` Jim Blandy
2006-04-14  8:43       ` Eli Zaretskii
2006-04-14  7:58   ` Vladimir Prus
2006-04-14  8:07     ` Jim Blandy [this message]
2006-04-14  8:30       ` Vladimir Prus
2006-04-14  8:57     ` Eli Zaretskii
2006-04-14 12:52       ` Vladimir Prus
2006-04-14 13:07         ` Daniel Jacobowitz
2006-04-14 14:23           ` Eli Zaretskii
2006-04-14 14:29             ` Daniel Jacobowitz
2006-04-14 14:53               ` Eli Zaretskii
2006-04-14 17:10                 ` Daniel Jacobowitz
2006-04-14 17:55               ` Jim Blandy
2006-04-14 18:27                 ` Eli Zaretskii
2006-04-14 18:30                   ` Jim Blandy
2006-04-14 19:19                     ` Eli Zaretskii
2006-04-14 14:16         ` Eli Zaretskii
2006-04-14 14:50           ` Vladimir Prus
2006-04-14 17:18             ` Eli Zaretskii
2006-04-14 18:03               ` Jim Blandy
2006-04-14 19:16                 ` Eli Zaretskii
2006-04-14 19:22                   ` Jim Blandy
2006-04-14 22:18                     ` Daniel Jacobowitz
2006-04-16 11:39                       ` Jim Blandy
2006-04-16 15:07                         ` Eli Zaretskii
2006-04-15  7:14                     ` Eli Zaretskii
2006-04-17  7:16                       ` Vladimir Prus
2006-04-17  8:58                         ` Eli Zaretskii
2006-04-17 10:35                           ` Vladimir Prus
2006-04-17 12:26                             ` Eli Zaretskii
2006-04-17 13:56                               ` Vladimir Prus
2006-04-18  5:31                                 ` Eli Zaretskii
2006-04-14 19:53                 ` Mark Kettenis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f2776cb0604140029r44decd6atfa728aad53cb596d@mail.gmail.com \
    --to=jimb@red-bean.com \
    --cc=gdb@sources.redhat.com \
    --cc=ghost@cs.msu.su \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox