Re: printing wchar_t* - Vladimir Prus

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

From: Vladimir Prus <ghost@cs.msu.su>
To: Eli Zaretskii <eliz@gnu.org>
Cc: jimb@red-bean.com,  gdb@sources.redhat.com
Subject: Re: printing wchar_t*
Date: Mon, 17 Apr 2006 13:56:00 -0000	[thread overview]
Message-ID: <200604171616.28444.ghost@cs.msu.su> (raw)
In-Reply-To: <uzmikxxab.fsf@gnu.org>

On Monday 17 April 2006 15:21, Eli Zaretskii wrote:

> > > What I was saying that indeed this conversion is easy, but it's not
> > > even close to doing what the front end generally would like to do with
> > > the string.  You want to _process_ the string, which means you want to
> > > know its length in characters (not bytes), you want to know what
> > > character set they encode, you want to be able to find the n-th
> > > character in the string, etc.  The encoding suggested by Jim makes
> > > these tasks very hard, much harder than if we send the string as an
> > > array of fixed-length wide characters.
> >
> > That's a *completely* different topic.
>
> Yes, it is.  But we must keep it in mind because the front ends want
> strings to do something with them.

Eli, I think we're running in circles. I'd like to reiterate why I ideally 
want from gdb:

  1. For any wchar_t* value, be it value of a variable, or function
     parameter three levels up the stack, or member of structure, I want
     gdb to print that value in specific format that's easy for frontend
     to use. String with escapes is fine.
  2. I want that formatting to take effect both for MI commands and for
     'print' command, since the user can issue 'print' command manually.
  3. I don't mind having this behaviour only when --interpreter=mi is
     specified.

I think that two question we did not agree on are:

  1. When talking to FE, should literals be used at all, or string should 
     consist of just \x escapes.
  2. When talking to user, should we use string literals, or just \x escapes.

I hope you'll agree that using \x escapes when talking to user in not 
acceptable. And since gdb right now assumes ASCII charset for output, I don't 
think there will be any problems if ASCII characters are output as-is, 
without escaping.

> > Second, frontend needs to display the data, however it will operate
> > using its own data structures, and it does not matter if \x escapes
> > were used or not. No frontend will ever work on a string containing
> > embedded "\x" escapes.
>
> I was saying that the ASCII encoding suggested by Jim makes it harder
> to convert the text into wide characters, that's all.

I don't see why it's so, but nevermind.

> > > That's extra job for
> > > GDB.  (Again, we were originally talking about wchar_t, not multibyte
> > > strings.)
> >
> > I don't understand what's this extra job. This is as simple as:
> >
> >    for c in wchar_t* literal:
> >        if c is representable in host encoding:
> >             output_literal
> >        else
> >             output_hex_escape
>
> That might sound simple for you, but it isn't, in general.  The
> ``representable in host encoding'' part is very non-trivial; for
> example, how do you tell whether the Unicode codepoints 0x05C3 and
> 0x05C4 can be represented in the Windows codepage 1255 (the former
> can, the latter cannot)?  This is generally impossible without using
> very complicated algorithms and/or large data bases.
>
> The other complex part is ``output_literal'': again, there's no simple
> algorithm to map Unicode's 0x05C3 into cp1255's 0xD3.  You need tables
> again, and you need separate tables for each possible encoding (Hebrew
> has at least 3 widely used ones, Russian has at least 5, etc.).

iconv has those tables. You see problems where there are none.

> > > > Really, using strings with \x escapes differs from array
> > > > printing in just one point: some characters are printed not as hex
> > > > values, but as characters in local 8-bit encoding. Why do you think
> > > > this is a problem?
> > >
> > > Because knowing what is the ``local 8-bit encoding'' is in itself a
> > > huge problem.
> >
> > [...]
> > I trust you on that, but nothing prevents user/frontend to explicitly
> > specify the encoding.
>
> What makes you think the user and/or front end will know what to
> specify?  Experience shows they generally don't.

First you say it's not possible to detect encoding from environment. Then you 
say you can't trust user/frontend. Together, that sounds like the problem of 
making gdb print char* literals reliably is impossible. Is that what you're 
trying to say? 

> > 1. Gbd should be modified to print wchar_t* literals.
>
> ``Print'' is ambiguous in this context.  I believe you mean ``send to
> the front end'', since this was your original problem.  If the front
> end is charged with displaying the wchar_t strings, GDB does not need
> to print anything by itself.  Am I right?
>
> > It should use the same
> > logic as for char* to decide if value is representable in the host
> > charset,
>
> I hope I explained above why this part is highly non-trivial.  

Using existing logic is in fact absolutely trivial -- that logic already 
*exists*, you don't need to do anything. 

> That is 
> why I think GDB should use hex notation for all characters, and leave
> it for the FE to deal with their display.

I disagree, for the simple reason that for char* values, existing logic did 
not cause any problems. Also, while I can take a stab at wchar_t* output, I 
would not be comfortable with special casing wchar_t* output to frontend.

- Volodya

next prev parent reply	other threads:[~2006-04-17 12:16 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-13 17:07 Vladimir Prus
2006-04-13 17:25 ` Eli Zaretskii
2006-04-14  7:29   ` Vladimir Prus
2006-04-14  8:47     ` Eli Zaretskii
2006-04-14 12:47       ` Vladimir Prus
2006-04-14 13:05         ` Eli Zaretskii
2006-04-14 13:06           ` Vladimir Prus
2006-04-14 13:15             ` Robert Dewar
2006-04-14 13:17           ` Daniel Jacobowitz
2006-04-14 13:59             ` Robert Dewar
2006-04-14 14:37             ` Eli Zaretskii
2006-04-14 14:08       ` Paul Koning
2006-04-14 14:47         ` Eli Zaretskii
2006-04-14 15:00           ` Vladimir Prus
2006-04-14 17:53             ` Eli Zaretskii
2006-04-17  7:05               ` Vladimir Prus
2006-04-17  8:35                 ` Eli Zaretskii
2006-04-13 18:06 ` Jim Blandy
2006-04-13 21:18   ` Eli Zaretskii
2006-04-14  6:02     ` Jim Blandy
2006-04-14  8:43       ` Eli Zaretskii
2006-04-14  7:58   ` Vladimir Prus
2006-04-14  8:07     ` Jim Blandy
2006-04-14  8:30       ` Vladimir Prus
2006-04-14  8:57     ` Eli Zaretskii
2006-04-14 12:52       ` Vladimir Prus
2006-04-14 13:07         ` Daniel Jacobowitz
2006-04-14 14:23           ` Eli Zaretskii
2006-04-14 14:29             ` Daniel Jacobowitz
2006-04-14 14:53               ` Eli Zaretskii
2006-04-14 17:10                 ` Daniel Jacobowitz
2006-04-14 17:55               ` Jim Blandy
2006-04-14 18:27                 ` Eli Zaretskii
2006-04-14 18:30                   ` Jim Blandy
2006-04-14 19:19                     ` Eli Zaretskii
2006-04-14 14:16         ` Eli Zaretskii
2006-04-14 14:50           ` Vladimir Prus
2006-04-14 17:18             ` Eli Zaretskii
2006-04-14 18:03               ` Jim Blandy
2006-04-14 19:16                 ` Eli Zaretskii
2006-04-14 19:22                   ` Jim Blandy
2006-04-14 22:18                     ` Daniel Jacobowitz
2006-04-16 11:39                       ` Jim Blandy
2006-04-16 15:07                         ` Eli Zaretskii
2006-04-15  7:14                     ` Eli Zaretskii
2006-04-17  7:16                       ` Vladimir Prus
2006-04-17  8:58                         ` Eli Zaretskii
2006-04-17 10:35                           ` Vladimir Prus
2006-04-17 12:26                             ` Eli Zaretskii
2006-04-17 13:56                               ` Vladimir Prus [this message]
2006-04-18  5:31                                 ` Eli Zaretskii
2006-04-14 19:53                 ` Mark Kettenis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200604171616.28444.ghost@cs.msu.su \
    --to=ghost@cs.msu.su \
    --cc=eliz@gnu.org \
    --cc=gdb@sources.redhat.com \
    --cc=jimb@red-bean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox