Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* Sevenbit-strings only partially respected?
@ 2010-10-09 19:26 Vladimir Prus
  2010-10-09 19:48 ` Vladimir Prus
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Prus @ 2010-10-09 19:26 UTC (permalink / raw)
  To: gdb


I've run into a situation where setting 'print sevenbit-strings' of off still 
does not prevent GDB from escaping some characters. Specifically,
consider the character 0xD0, and this bit in printchar:

  if (c < 0x20 ||		/* Low control chars */
      (c >= 0x7F && c < 0xA0) ||	/* DEL, High controls */
      (sevenbit_strings && c >= 0x80))
    {				/* high order bit set */

Apparently, the second condition fires and causes 0xD0 to be quoted. Is
this expected behaviour?

- Volodya


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sevenbit-strings only partially respected?
  2010-10-09 19:26 Sevenbit-strings only partially respected? Vladimir Prus
@ 2010-10-09 19:48 ` Vladimir Prus
  2010-10-09 21:05   ` Eli Zaretskii
  2010-10-12 17:44   ` Tom Tromey
  0 siblings, 2 replies; 5+ messages in thread
From: Vladimir Prus @ 2010-10-09 19:48 UTC (permalink / raw)
  To: gdb

Vladimir Prus wrote:

> 
> I've run into a situation where setting 'print sevenbit-strings' of off still
> does not prevent GDB from escaping some characters. Specifically,
> consider the character 0xD0, and this bit in printchar:
> 
>   if (c < 0x20 ||		/* Low control chars */
>       (c >= 0x7F && c < 0xA0) ||	/* DEL, High controls */
>       (sevenbit_strings && c >= 0x80))
>     {/* high order bit set */
> 
> Apparently, the second condition fires and causes 0xD0 to be quoted. Is
> this expected behaviour?

Doh. Of course 0xD0 is larger than 0xA0. The value that causes the actual
problem is 0x83. Russian letter 'у' is encoded in UTF8 as 0xD1 0x83, and
because of the above code, strings with that letter (and some other letters)
get messed up completely.

Can we please have that

	(c >= 0x7F && c < 0xA0)

clause ripped off?

Thanks,
Volodya



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sevenbit-strings only partially respected?
  2010-10-09 19:48 ` Vladimir Prus
@ 2010-10-09 21:05   ` Eli Zaretskii
  2010-10-12 17:44   ` Tom Tromey
  1 sibling, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2010-10-09 21:05 UTC (permalink / raw)
  To: Vladimir Prus; +Cc: gdb

> From: Vladimir Prus <vladimir@codesourcery.com>
> Date: Sat, 09 Oct 2010 23:48:12 +0400
> 
> >   if (c < 0x20 ||		/* Low control chars */
> >       (c >= 0x7F && c < 0xA0) ||	/* DEL, High controls */
> >       (sevenbit_strings && c >= 0x80))
> >     {/* high order bit set */
> > 
> > Apparently, the second condition fires and causes 0xD0 to be quoted. Is
> > this expected behaviour?
> 
> Doh. Of course 0xD0 is larger than 0xA0. The value that causes the actual
> problem is 0x83. Russian letter 'у' is encoded in UTF8 as 0xD1 0x83, and
> because of the above code, strings with that letter (and some other letters)
> get messed up completely.

That `(c >= 0x7F && c < 0xA0)' condition assumes ISO-8859-n encodings
(probably was coded for 8859-1), and should not be used with anything
else.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sevenbit-strings only partially respected?
  2010-10-09 19:48 ` Vladimir Prus
  2010-10-09 21:05   ` Eli Zaretskii
@ 2010-10-12 17:44   ` Tom Tromey
  2010-10-13  1:50     ` Tom Tromey
  1 sibling, 1 reply; 5+ messages in thread
From: Tom Tromey @ 2010-10-12 17:44 UTC (permalink / raw)
  To: Vladimir Prus; +Cc: gdb

>>>>> "Volodya" == Vladimir Prus <vladimir@codesourcery.com> writes:

Volodya> Russian letter 'у' is encoded in UTF8 as 0xD1 0x83, and because
Volodya> of the above code, strings with that letter (and some other
Volodya> letters) get messed up completely.

Volodya> Can we please have that
Volodya> 	(c >= 0x7F && c < 0xA0)
Volodya> clause ripped off?

It would be fine by me.

FWIW I think sevenbit-strings is broken by design.  I think it would be
better if MI declared the host charset to be UTF-8 and stopped using
that setting.

Tom


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sevenbit-strings only partially respected?
  2010-10-12 17:44   ` Tom Tromey
@ 2010-10-13  1:50     ` Tom Tromey
  0 siblings, 0 replies; 5+ messages in thread
From: Tom Tromey @ 2010-10-13  1:50 UTC (permalink / raw)
  To: Vladimir Prus; +Cc: gdb

Tom> FWIW I think sevenbit-strings is broken by design.  I think it would be
Tom> better if MI declared the host charset to be UTF-8 and stopped using
Tom> that setting.

BTW, if you change this, look at language.h:PRINT_LITERAL_FORM as well.

Tom


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-10-13  1:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-09 19:26 Sevenbit-strings only partially respected? Vladimir Prus
2010-10-09 19:48 ` Vladimir Prus
2010-10-09 21:05   ` Eli Zaretskii
2010-10-12 17:44   ` Tom Tromey
2010-10-13  1:50     ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox