* printing wchar_t* @ 2006-04-13 17:07 Vladimir Prus 2006-04-13 17:25 ` Eli Zaretskii 2006-04-13 18:06 ` Jim Blandy 0 siblings, 2 replies; 52+ messages in thread From: Vladimir Prus @ 2006-04-13 17:07 UTC (permalink / raw) To: gdb Hi, at the moment, gdb seem to provide no support for printing wchar_t* values. It prints them like this: (gdb) print p15 print p15 $486 = (wchar_t *) 0x80489f8 Is there any "standard" way to make gdb automatically traverse wchar_t*, printing values, and stopping at '0' value. I don't care much how it's actually printed, for example, printing raw hex values will work: 0x56, 0x1456 or using \u escapes: 'test\u1234' or whatever. I have a user-defined command that can produce the output I want, but is defining a custom command the right approach? - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-13 17:07 printing wchar_t* Vladimir Prus @ 2006-04-13 17:25 ` Eli Zaretskii 2006-04-14 7:29 ` Vladimir Prus 2006-04-13 18:06 ` Jim Blandy 1 sibling, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-13 17:25 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Thu, 13 Apr 2006 20:04:32 +0400 > > at the moment, gdb seem to provide no support for printing wchar_t* values. > It prints them like this: > > (gdb) print p15 > print p15 > $486 = (wchar_t *) 0x80489f8 > > Is there any "standard" way to make gdb automatically traverse wchar_t*, > printing values, and stopping at '0' value. What character set is used by the wide characters in the wchar_t arrays? GDB has some support for a few single-byte character sets, see the node "Character Sets" in the manual. > I have a user-defined command that can produce the output I want, but is > defining a custom command the right approach? It's one possibility, the other one being to call a function in the debuggee to produce the string. Yet another possibility is to do the conversion in your GUI front end. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-13 17:25 ` Eli Zaretskii @ 2006-04-14 7:29 ` Vladimir Prus 2006-04-14 8:47 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 7:29 UTC (permalink / raw) To: gdb Eli Zaretskii wrote: >> From: Vladimir Prus <ghost@cs.msu.su> >> Date: Thu, 13 Apr 2006 20:04:32 +0400 >> >> at the moment, gdb seem to provide no support for printing wchar_t* >> values. It prints them like this: >> >> (gdb) print p15 >> print p15 >> $486 = (wchar_t *) 0x80489f8 >> >> Is there any "standard" way to make gdb automatically traverse wchar_t*, >> printing values, and stopping at '0' value. > > What character set is used by the wide characters in the wchar_t > arrays? GDB has some support for a few single-byte character sets, > see the node "Character Sets" in the manual. Relatively safe bet would be to assume it's some zero-terminated character set. I plan to assume it's either UTF-16 or UTF-32 in the GUI (the conversion code is the same for both encodings), but gdb can just print raw values. >> I have a user-defined command that can produce the output I want, but is >> defining a custom command the right approach? > > It's one possibility, the other one being to call a function in the > debuggee to produce the string. And what such a function will return? char* in local 8-bit encoding? In that case, no all wchar_t* variable can be printed. > Yet another possibility is to do the > conversion in your GUI front end. That's what I'm going to do, but first I need to get raw data, preferrably without issing an MI command for every single character. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 7:29 ` Vladimir Prus @ 2006-04-14 8:47 ` Eli Zaretskii 2006-04-14 12:47 ` Vladimir Prus 2006-04-14 14:08 ` Paul Koning 0 siblings, 2 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 8:47 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Fri, 14 Apr 2006 10:01:57 +0400 > > > What character set is used by the wide characters in the wchar_t > > arrays? GDB has some support for a few single-byte character sets, > > see the node "Character Sets" in the manual. > > Relatively safe bet would be to assume it's some zero-terminated character > set. I plan to assume it's either UTF-16 or UTF-32 in the GUI (the > conversion code is the same for both encodings), but gdb can just print raw > values. We should get our terminology right: UTF-16 is not a character set, it's an encoding (and a multibyte encoding, btw). As for UTF-32, I don't think such a beast exists at all. I think you meant 16-bit Unicode characters (a.k.a. the BMP) and 32-bit Unicode characters, respectively. > > It's one possibility, the other one being to call a function in the > > debuggee to produce the string. > > And what such a function will return? char* in local 8-bit encoding? In that > case, no all wchar_t* variable can be printed. If you want to display non-ASCII strings, it means you already have some way of displaying such characters. The function I mentioned would not return anything, it would actually _display_ the string. For example, in command-line version of GDB, if the terminal supports UTF-8 encoded characters, that function would output a UTF-8 encoding of the non-ASCII string, and then the terminal will display them with the correct glyphs. > > Yet another possibility is to do the > > conversion in your GUI front end. > > That's what I'm going to do, but first I need to get raw data, preferrably > without issing an MI command for every single character. A wchar_t string is just an array, and GDB already has a feature to produce N elements of an array. In CLI, you say "print *array@20" to print the first 20 elements of the named array. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 8:47 ` Eli Zaretskii @ 2006-04-14 12:47 ` Vladimir Prus 2006-04-14 13:05 ` Eli Zaretskii 2006-04-14 14:08 ` Paul Koning 1 sibling, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 12:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gdb On Friday 14 April 2006 12:30, Eli Zaretskii wrote: > > Relatively safe bet would be to assume it's some zero-terminated > > character set. I plan to assume it's either UTF-16 or UTF-32 in the GUI > > (the conversion code is the same for both encodings), but gdb can just > > print raw values. > > We should get our terminology right: UTF-16 is not a character set, > it's an encoding (and a multibyte encoding, btw). As for UTF-32, I > don't think such a beast exists at all. > > I think you meant 16-bit Unicode characters (a.k.a. the BMP) and > 32-bit Unicode characters, respectively. No, I meant UTF-16 encoding (the one with surrogate pairs), and UTF-32 encoding (which does exists, in the Unicode standard). > > > It's one possibility, the other one being to call a function in the > > > debuggee to produce the string. > > > > And what such a function will return? char* in local 8-bit encoding? In > > that case, no all wchar_t* variable can be printed. > > If you want to display non-ASCII strings, it means you already have > some way of displaying such characters. The function I mentioned > would not return anything, it would actually _display_ the string. > > For example, in command-line version of GDB, if the terminal supports > UTF-8 encoded characters, that function would output a UTF-8 encoding > of the non-ASCII string, and then the terminal will display them with > the correct glyphs. This is non-starter. I can't have debuggee send data to KDevelop widgets. > > > Yet another possibility is to do the > > > conversion in your GUI front end. > > > > That's what I'm going to do, but first I need to get raw data, > > preferrably without issing an MI command for every single character. > > A wchar_t string is just an array, and GDB already has a feature to > produce N elements of an array. In CLI, you say "print *array@20" to > print the first 20 elements of the named array. I don't know how many elements there are, as wchar_t* is zero terminated, so I'd like gdb to compute the length automatically. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 12:47 ` Vladimir Prus @ 2006-04-14 13:05 ` Eli Zaretskii 2006-04-14 13:06 ` Vladimir Prus 2006-04-14 13:17 ` Daniel Jacobowitz 0 siblings, 2 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 13:05 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Fri, 14 Apr 2006 12:46:57 +0400 > Cc: gdb@sources.redhat.com > > No, I meant UTF-16 encoding (the one with surrogate pairs), and UTF-32 > encoding (which does exists, in the Unicode standard). What software uses that? Anyway, UTF-16 is a variable-length encoding, so wchar_t is not it. > > For example, in command-line version of GDB, if the terminal supports > > UTF-8 encoded characters, that function would output a UTF-8 encoding > > of the non-ASCII string, and then the terminal will display them with > > the correct glyphs. > > This is non-starter. I can't have debuggee send data to KDevelop widgets. That was just an example. I know it's irrelevant to your case (and, in fact, to any GUI front-end). > > A wchar_t string is just an array, and GDB already has a feature to > > produce N elements of an array. In CLI, you say "print *array@20" to > > print the first 20 elements of the named array. > > I don't know how many elements there are, as wchar_t* is zero terminated, so > I'd like gdb to compute the length automatically. That's easy. Assuming that is done, is it all you need? ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 13:05 ` Eli Zaretskii @ 2006-04-14 13:06 ` Vladimir Prus 2006-04-14 13:15 ` Robert Dewar 2006-04-14 13:17 ` Daniel Jacobowitz 1 sibling, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 13:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gdb On Friday 14 April 2006 16:55, Eli Zaretskii wrote: > > From: Vladimir Prus <ghost@cs.msu.su> > > Date: Fri, 14 Apr 2006 12:46:57 +0400 > > Cc: gdb@sources.redhat.com > > > > No, I meant UTF-16 encoding (the one with surrogate pairs), and UTF-32 > > encoding (which does exists, in the Unicode standard). > > What software uses that? I'd say, any software using std::wstring on Linux. > Anyway, UTF-16 is a variable-length encoding, so wchar_t is not it. Since C++ standard says nothing about encoding of wchar_t, specific application can do anything it likes. In particular, I believe that on Windows, wchar_t* is assumed to be in UTF-16 encoding. > > > A wchar_t string is just an array, and GDB already has a feature to > > > produce N elements of an array. In CLI, you say "print *array@20" to > > > print the first 20 elements of the named array. > > > > I don't know how many elements there are, as wchar_t* is zero terminated, > > so I'd like gdb to compute the length automatically. > > That's easy. Assuming that is done, is it all you need? Yes, that would be sufficient for me. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 13:06 ` Vladimir Prus @ 2006-04-14 13:15 ` Robert Dewar 0 siblings, 0 replies; 52+ messages in thread From: Robert Dewar @ 2006-04-14 13:15 UTC (permalink / raw) To: Vladimir Prus; +Cc: Eli Zaretskii, gdb Vladimir Prus wrote: > On Friday 14 April 2006 16:55, Eli Zaretskii wrote: >>> From: Vladimir Prus <ghost@cs.msu.su> >>> Date: Fri, 14 Apr 2006 12:46:57 +0400 >>> Cc: gdb@sources.redhat.com >>> >>> No, I meant UTF-16 encoding (the one with surrogate pairs), and UTF-32 >>> encoding (which does exists, in the Unicode standard). >> What software uses that? > > I'd say, any software using std::wstring on Linux. > >> Anyway, UTF-16 is a variable-length encoding, so wchar_t is not it. > > Since C++ standard says nothing about encoding of wchar_t, specific > application can do anything it likes. In particular, I believe that on > Windows, wchar_t* is assumed to be in UTF-16 encoding. It only makes sense to talk about UTF-16 encoding in the context of wchar_t if wchar_t is 16-bits, otherwise, as noted above, UTF-32 is a variable length encoding, not suitable for wchar_t. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 13:05 ` Eli Zaretskii 2006-04-14 13:06 ` Vladimir Prus @ 2006-04-14 13:17 ` Daniel Jacobowitz 2006-04-14 13:59 ` Robert Dewar 2006-04-14 14:37 ` Eli Zaretskii 1 sibling, 2 replies; 52+ messages in thread From: Daniel Jacobowitz @ 2006-04-14 13:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Vladimir Prus, gdb On Fri, Apr 14, 2006 at 03:55:49PM +0300, Eli Zaretskii wrote: > Anyway, UTF-16 is a variable-length encoding, so wchar_t is not it. There's a rant about this in the glibc manual I was just reading... In fact, on many platforms, wchar_t is only 16-bit. How exactly you handle UTF-8 or UCS-4 input in this case, I don't really understand. -- Daniel Jacobowitz CodeSourcery ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 13:17 ` Daniel Jacobowitz @ 2006-04-14 13:59 ` Robert Dewar 2006-04-14 14:37 ` Eli Zaretskii 1 sibling, 0 replies; 52+ messages in thread From: Robert Dewar @ 2006-04-14 13:59 UTC (permalink / raw) To: Eli Zaretskii, Vladimir Prus, gdb Daniel Jacobowitz wrote: > On Fri, Apr 14, 2006 at 03:55:49PM +0300, Eli Zaretskii wrote: >> Anyway, UTF-16 is a variable-length encoding, so wchar_t is not it. > > There's a rant about this in the glibc manual I was just reading... > > In fact, on many platforms, wchar_t is only 16-bit. How exactly you > handle UTF-8 or UCS-4 input in this case, I don't really understand. Seems clear, you can only represent a limited range of codes if you only have 16 bits! UTF-8 is a variable length encoding that can represent any character in the 32-bit range. Obviously if you have to construct wchar_t values from UTF-8 input, then you will not be able to represent characters whose codes exceed 65535. Same with UCS-4. > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 13:17 ` Daniel Jacobowitz 2006-04-14 13:59 ` Robert Dewar @ 2006-04-14 14:37 ` Eli Zaretskii 1 sibling, 0 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 14:37 UTC (permalink / raw) To: Vladimir Prus, gdb > Date: Fri, 14 Apr 2006 09:07:29 -0400 > From: Daniel Jacobowitz <drow@false.org> > Cc: Vladimir Prus <ghost@cs.msu.su>, gdb@sources.redhat.com > > On Fri, Apr 14, 2006 at 03:55:49PM +0300, Eli Zaretskii wrote: > > Anyway, UTF-16 is a variable-length encoding, so wchar_t is not it. > > There's a rant about this in the glibc manual I was just reading... > > In fact, on many platforms, wchar_t is only 16-bit. How exactly you > handle UTF-8 or UCS-4 input in this case, I don't really understand. Robert answered to that, and I agree with his response. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 8:47 ` Eli Zaretskii 2006-04-14 12:47 ` Vladimir Prus @ 2006-04-14 14:08 ` Paul Koning 2006-04-14 14:47 ` Eli Zaretskii 1 sibling, 1 reply; 52+ messages in thread From: Paul Koning @ 2006-04-14 14:08 UTC (permalink / raw) To: eliz; +Cc: ghost, gdb >>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes: >> From: Vladimir Prus <ghost@cs.msu.su> Date: Fri, 14 Apr 2006 >> 10:01:57 +0400 >> >> > What character set is used by the wide characters in the wchar_t >> > arrays? GDB has some support for a few single-byte character >> sets, > see the node "Character Sets" in the manual. >> >> Relatively safe bet would be to assume it's some zero-terminated >> character set. I plan to assume it's either UTF-16 or UTF-32 in >> the GUI (the conversion code is the same for both encodings), but >> gdb can just print raw values. Eli> We should get our terminology right: UTF-16 is not a character Eli> set, it's an encoding (and a multibyte encoding, btw). As for Eli> UTF-32, I don't think such a beast exists at all. I seem to remember seeing it mentioned. It certainly makes sense. Eli> I think you meant 16-bit Unicode characters (a.k.a. the BMP) and Eli> 32-bit Unicode characters, respectively. If you have 16 bit wide chars, it seems possible that those might contain UTF-16 encoding of full (beyond BMP) Unicode characters. paul ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:08 ` Paul Koning @ 2006-04-14 14:47 ` Eli Zaretskii 2006-04-14 15:00 ` Vladimir Prus 0 siblings, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 14:47 UTC (permalink / raw) To: Paul Koning; +Cc: ghost, gdb > Date: Fri, 14 Apr 2006 09:43:01 -0400 > From: Paul Koning <pkoning@equallogic.com> > Cc: ghost@cs.msu.su, gdb@sources.redhat.com > > If you have 16 bit wide chars, it seems possible that those might > contain UTF-16 encoding of full (beyond BMP) Unicode characters. You could use wchar_t arrays for that, but then not every array element will be a full character, and you will not be able to access individual characters by their positional index. In other words, in this case each element of the wchar_t array is no longer a ``wide character'', but one of the few shorts that encode a character. If we want to support wchar_t arrays that store UTF-16, we will need to add a feature to GDB to convert UTF-16 to the full UCS-4 codepoints, and output those. Alternatively, the FE will have to support display of UTF-16 encoded characters. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:47 ` Eli Zaretskii @ 2006-04-14 15:00 ` Vladimir Prus 2006-04-14 17:53 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 15:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Paul Koning, gdb On Friday 14 April 2006 18:29, Eli Zaretskii wrote: > > Date: Fri, 14 Apr 2006 09:43:01 -0400 > > From: Paul Koning <pkoning@equallogic.com> > > Cc: ghost@cs.msu.su, gdb@sources.redhat.com > > > > If you have 16 bit wide chars, it seems possible that those might > > contain UTF-16 encoding of full (beyond BMP) Unicode characters. > > You could use wchar_t arrays for that, but then not every array > element will be a full character, and you will not be able to access > individual characters by their positional index. And what? Even if wchar_t is 32 bit then element at position 'i' can be combining character modifying another character, and be of little use itself. > In other words, in this case each element of the wchar_t array is no > longer a ``wide character'', but one of the few shorts that encode a > character. > > If we want to support wchar_t arrays that store UTF-16, we will need > to add a feature to GDB to convert UTF-16 to the full UCS-4 > codepoints, and output those. That's what I mentioned in a reply to Jim -- since the current string printing code operated "one wchar_t at a time", it's not suitable for outputing UTF-16 encoded wchar_t values to the user. > Alternatively, the FE will have to > support display of UTF-16 encoded characters. Speaking about FE, handling UTF-16 is trivial, so printing just wchar_t values will be sufficient. Only if we want to properly show UTF-16 strings to a user of console gdb, some work may be necessary. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 15:00 ` Vladimir Prus @ 2006-04-14 17:53 ` Eli Zaretskii 2006-04-17 7:05 ` Vladimir Prus 0 siblings, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 17:53 UTC (permalink / raw) To: Vladimir Prus; +Cc: pkoning, gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Fri, 14 Apr 2006 18:50:07 +0400 > Cc: Paul Koning <pkoning@equallogic.com>, gdb@sources.redhat.com > > > You could use wchar_t arrays for that, but then not every array > > element will be a full character, and you will not be able to access > > individual characters by their positional index. > > And what? Even if wchar_t is 32 bit then element at position 'i' can be > combining character modifying another character, and be of little use itself. You are introducing into the argument yet another face of a character: how it is displayed. It's true that some characters, when they are adjacent to each other, are displayed in some special way (the ff ligature is one simple example of that), but that is something for the rendering engine to take care of, it has nothing to do with the string's content. As far as any software, except the rendering engine, is concerned, the combining character is, in fact, part of the string. For example, if the user wants to search for such a character, the program must find it. So, for the purposes of processing the wchar_t strings, it is very important to know whether they are fixed-size wide characters or variable-size encoding. If you just copy the string verbatim to and fro, then it doesn't matter, but for anything more complex the difference is very large. > > If we want to support wchar_t arrays that store UTF-16, we will need > > to add a feature to GDB to convert UTF-16 to the full UCS-4 > > codepoints, and output those. > > That's what I mentioned in a reply to Jim -- since the current string printing > code operated "one wchar_t at a time", it's not suitable for outputing UTF-16 > encoded wchar_t values to the user. I don't understand: if the wchar_t array holds a UTF-16 encoding, then when you receive the entire string, you have a UTF-16 encoding of what you want to display, and you yourself said that displaying a UTF-16 encoded string is easy for you. So where is the problem? is that only that you cannot know the length of the UTF-16 encoded string? or is there something else missing? > > Alternatively, the FE will have to > > support display of UTF-16 encoded characters. > > Speaking about FE, handling UTF-16 is trivial Maybe in your environment and windowing system, but not in all cases, AFAIK. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 17:53 ` Eli Zaretskii @ 2006-04-17 7:05 ` Vladimir Prus 2006-04-17 8:35 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-17 7:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pkoning, gdb On Friday 14 April 2006 21:10, Eli Zaretskii wrote: > > > If we want to support wchar_t arrays that store UTF-16, we will need > > > to add a feature to GDB to convert UTF-16 to the full UCS-4 > > > codepoints, and output those. > > > > That's what I mentioned in a reply to Jim -- since the current string > > printing code operated "one wchar_t at a time", it's not suitable for > > outputing UTF-16 encoded wchar_t values to the user. > > I don't understand: if the wchar_t array holds a UTF-16 encoding, then > when you receive the entire string, you have a UTF-16 encoding of what > you want to display, and you yourself said that displaying a UTF-16 > encoded string is easy for you. So where is the problem? is that only > that you cannot know the length of the UTF-16 encoded string? or is > there something else missing? For my frontend -- there's no problem, I can handle UTF-16 myself. However, if gdb is to ever produce output in UTF-8, that should be readable by the console, then it should handle surrogate pairs itself. Taking first and second element of surrogate pair and converting both to UTF-8, individually, won't work, for obvious reasons. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-17 7:05 ` Vladimir Prus @ 2006-04-17 8:35 ` Eli Zaretskii 0 siblings, 0 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-17 8:35 UTC (permalink / raw) To: Vladimir Prus; +Cc: pkoning, gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Mon, 17 Apr 2006 10:17:40 +0400 > Cc: pkoning@equallogic.com, > gdb@sources.redhat.com > > On Friday 14 April 2006 21:10, Eli Zaretskii wrote: > > > > > If we want to support wchar_t arrays that store UTF-16, we will need > > > > to add a feature to GDB to convert UTF-16 to the full UCS-4 > > > > codepoints, and output those. > > > > > > That's what I mentioned in a reply to Jim -- since the current string > > > printing code operated "one wchar_t at a time", it's not suitable for > > > outputing UTF-16 encoded wchar_t values to the user. > > > > I don't understand: if the wchar_t array holds a UTF-16 encoding, then > > when you receive the entire string, you have a UTF-16 encoding of what > > you want to display, and you yourself said that displaying a UTF-16 > > encoded string is easy for you. So where is the problem? is that only > > that you cannot know the length of the UTF-16 encoded string? or is > > there something else missing? > > For my frontend -- there's no problem, I can handle UTF-16 myself. However, if > gdb is to ever produce output in UTF-8 We were talking about wchar_t and wide character strings, which UTF-8 isn't. Let's not confuse ourselves more than we already did. Adding to GDB support for converting arbitrary encoded text into UTF-8 would be a giant job. > then it should handle surrogate pairs itself. Taking first and > second element of surrogate pair and converting both to UTF-8, individually, > won't work, for obvious reasons. I don't think it's quite as ``obvious'' as you imply. Handling surrogates is generally a job for a display engine, so a UTF-8 enabled terminal could very well do it itself. I don't know if they actually do that, though. But anyway, this is a different issue. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-13 17:07 printing wchar_t* Vladimir Prus 2006-04-13 17:25 ` Eli Zaretskii @ 2006-04-13 18:06 ` Jim Blandy 2006-04-13 21:18 ` Eli Zaretskii 2006-04-14 7:58 ` Vladimir Prus 1 sibling, 2 replies; 52+ messages in thread From: Jim Blandy @ 2006-04-13 18:06 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote: > I have a user-defined command that can produce the output I want, but is > defining a custom command the right approach? Well, you'd like wide strings to be printed properly when they appear in structures, as arguments to functions, and so on, right? So a user-defined command isn't ideal. The best approach would be to extend charset.[ch] to handle wide character sets as well, and then add code to the language-specific printing routines to use the charset functions. (This is fortunately much simpler than adding support for multibyte characters.) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-13 18:06 ` Jim Blandy @ 2006-04-13 21:18 ` Eli Zaretskii 2006-04-14 6:02 ` Jim Blandy 2006-04-14 7:58 ` Vladimir Prus 1 sibling, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-13 21:18 UTC (permalink / raw) To: Jim Blandy; +Cc: ghost, gdb > Date: Thu, 13 Apr 2006 10:31:18 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > Cc: gdb@sources.redhat.com > > The best approach would be to extend charset.[ch] to handle wide > character sets as well, and then add code to the language-specific > printing routines to use the charset functions. (This is fortunately > much simpler than adding support for multibyte characters.) Can you tell why you think it's much simpler? ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-13 21:18 ` Eli Zaretskii @ 2006-04-14 6:02 ` Jim Blandy 2006-04-14 8:43 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Jim Blandy @ 2006-04-14 6:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ghost, gdb On 4/13/06, Eli Zaretskii <eliz@gnu.org> wrote: > > Date: Thu, 13 Apr 2006 10:31:18 -0700 > > From: "Jim Blandy" <jimb@red-bean.com> > > Cc: gdb@sources.redhat.com > > > > The best approach would be to extend charset.[ch] to handle wide > > character sets as well, and then add code to the language-specific > > printing routines to use the charset functions. (This is fortunately > > much simpler than adding support for multibyte characters.) > > Can you tell why you think it's much simpler? Okay --- just to be clear, this is about multi-byte characters, not wide characters, which is what Volodya was asking about. - The code for limiting how much of a string GDB will print, and for detecting repetitions, seemed like it would be hard to adapt to multibyte encodings. Remember that you've got to be completely agnostic about the encoding; there are stateful encodings out there in widespread use, etc. - I don't think GDB should use off-the-shelf conversion stuff like iconv. For example, if you're looking at ISO-2022 text with the character set switching escape codes in there, I'd argue it'd be wrong for GDB to display those strings without showing the escape codes. It's a debugger, so people are looking at strings and corresponding indexes into those strings, and they need to be able to see exactly what's in there. iconv handles the escape codes silently. - Most programs can just print an error message and die if they see ill-formed multi-byte sequences: you gave them junk; fix it. GDB needs to do something more useful; its job is to be helpful exactly when your program is misbehaving and you don't know why. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 6:02 ` Jim Blandy @ 2006-04-14 8:43 ` Eli Zaretskii 0 siblings, 0 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 8:43 UTC (permalink / raw) To: Jim Blandy; +Cc: ghost, gdb > Date: Thu, 13 Apr 2006 11:06:12 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > Cc: ghost@cs.msu.su, gdb@sources.redhat.com > > On 4/13/06, Eli Zaretskii <eliz@gnu.org> wrote: > > > Date: Thu, 13 Apr 2006 10:31:18 -0700 > > > From: "Jim Blandy" <jimb@red-bean.com> > > > Cc: gdb@sources.redhat.com > > > > > > The best approach would be to extend charset.[ch] to handle wide > > > character sets as well, and then add code to the language-specific > > > printing routines to use the charset functions. (This is fortunately > > > much simpler than adding support for multibyte characters.) > > > > Can you tell why you think it's much simpler? > > Okay --- just to be clear, this is about multi-byte characters, not > wide characters, which is what Volodya was asking about. It's both, as far as I'm concerned: I was asking to explain why you think supporting wide characters is much easier than supporting multi-byte characters. > - I don't think GDB should use off-the-shelf conversion stuff like > iconv. For example, if you're looking at ISO-2022 text with the > character set switching escape codes in there, I'd argue it'd be wrong > for GDB to display those strings without showing the escape codes. > It's a debugger, so people are looking at strings and corresponding > indexes into those strings, and they need to be able to see exactly > what's in there. iconv handles the escape codes silently. If we add such a support, we should probably have GDB print both the raw and printable representation of the non-ASCII strings. We already do something similar with char data type. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-13 18:06 ` Jim Blandy 2006-04-13 21:18 ` Eli Zaretskii @ 2006-04-14 7:58 ` Vladimir Prus 2006-04-14 8:07 ` Jim Blandy 2006-04-14 8:57 ` Eli Zaretskii 1 sibling, 2 replies; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 7:58 UTC (permalink / raw) To: gdb Jim Blandy wrote: > On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote: >> I have a user-defined command that can produce the output I want, but is >> defining a custom command the right approach? > > Well, you'd like wide strings to be printed properly when they appear > in structures, as arguments to functions, and so on, right? So a > user-defined command isn't ideal. I think I'll still need to do some processing for wchar_t* on frontend side. The problem is that I don't see any way how gdb can print wchar_t in a way that does not require post-processing. It can print it as UTF8, but then for printing char* gdb should use local 8 bit encoding, which is likely to be *not* UTF8. Gdb can probably use some extra markers for values: like: "foo" for string in local 8-bit encoding L"foo" for string in UTF8 encoding. It's also possible to use "\u" escapes. But then there's a problem: - Do we assume that wchar_t is always UTF-16 or UTF-32? - If not: - how user can select this? - how user-specified encoding will be handled > The best approach would be to extend charset.[ch] to handle wide > character sets as well, and then add code to the language-specific > printing routines to use the charset functions. (This is fortunately > much simpler than adding support for multibyte characters.) For, for each wchar_t element language-specific code will call 'target_wchar_t_to_host', that will output specific representation of that wchar_t. Hmm, the interface there seem to assume theres 1<->1 mapping between target and host characters. This makes L"UTF8" format and ascii string with \u escapes format impossible, It seems. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 7:58 ` Vladimir Prus @ 2006-04-14 8:07 ` Jim Blandy 2006-04-14 8:30 ` Vladimir Prus 2006-04-14 8:57 ` Eli Zaretskii 1 sibling, 1 reply; 52+ messages in thread From: Jim Blandy @ 2006-04-14 8:07 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote: > Jim Blandy wrote: > > > On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote: > >> I have a user-defined command that can produce the output I want, but is > >> defining a custom command the right approach? > > > > Well, you'd like wide strings to be printed properly when they appear > > in structures, as arguments to functions, and so on, right? So a > > user-defined command isn't ideal. > > I think I'll still need to do some processing for wchar_t* on frontend side. > The problem is that I don't see any way how gdb can print wchar_t in a way > that does not require post-processing. It can print it as UTF8, but then > for printing char* gdb should use local 8 bit encoding, which is likely to > be *not* UTF8. Gdb can probably use some extra markers for values: like: > > "foo" for string in local 8-bit encoding > L"foo" for string in UTF8 encoding. > > It's also possible to use "\u" escapes. > > But then there's a problem: > > - Do we assume that wchar_t is always UTF-16 or UTF-32? > - If not: > - how user can select this? > - how user-specified encoding will be handled You can't hard-code assumptions about the character set into GDB. Nor can you hard-code the assumption that the host and target character sets are the same. GDB needs to do explicit conversions between the two as needed, and handle mismatches in some reasonable way. GDB already has the commands 'set host-charset' and 'set target-charset', so you can assume that you have accurate information about the character sets at hand. They fall back to ASCII. > > The best approach would be to extend charset.[ch] to handle wide > > character sets as well, and then add code to the language-specific > > printing routines to use the charset functions. (This is fortunately > > much simpler than adding support for multibyte characters.) > > For, for each wchar_t element language-specific code will call > 'target_wchar_t_to_host', that will output specific representation of that > wchar_t. Hmm, the interface there seem to assume theres 1<->1 mapping > between target and host characters. This makes L"UTF8" format and ascii > string with \u escapes format impossible, It seems. Not at all. The current character and string printing code uses those routines, and it handles unprintable and invalid characters just fine. See, for example, host_print_char_literally, and c_target_char_has_backslash_escape. GDB tries to print characters and strings as they would appear in source code. C doesn't assume that the source and execution character sets are the same; by using numeric escapes, you can write programs for any execution character set in any source character set. You just need enough information to manage the overlap. As far as 1-to-1 mappings are concerned, the only necessary property is that host_char_to_target and target_char_to_host be inverses, and return zero for characters that can't make a round trip. The existing string-printing code will automatically use numeric escapes for characters that target_char_to_host won't translate. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 8:07 ` Jim Blandy @ 2006-04-14 8:30 ` Vladimir Prus 0 siblings, 0 replies; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 8:30 UTC (permalink / raw) To: Jim Blandy; +Cc: gdb On Friday 14 April 2006 11:29, Jim Blandy wrote: > On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote: > > Jim Blandy wrote: > > > On 4/13/06, Vladimir Prus <ghost@cs.msu.su> wrote: > > >> I have a user-defined command that can produce the output I want, but > > >> is defining a custom command the right approach? > > > > > > Well, you'd like wide strings to be printed properly when they appear > > > in structures, as arguments to functions, and so on, right? So a > > > user-defined command isn't ideal. > > > > I think I'll still need to do some processing for wchar_t* on frontend > > side. The problem is that I don't see any way how gdb can print wchar_t > > in a way that does not require post-processing. It can print it as UTF8, > > but then for printing char* gdb should use local 8 bit encoding, which is > > likely to be *not* UTF8. Gdb can probably use some extra markers for > > values: like: > > > > "foo" for string in local 8-bit encoding > > L"foo" for string in UTF8 encoding. > > > > It's also possible to use "\u" escapes. > > > > But then there's a problem: > > > > - Do we assume that wchar_t is always UTF-16 or UTF-32? > > - If not: > > - how user can select this? > > - how user-specified encoding will be handled > > You can't hard-code assumptions about the character set into GDB. Nor > can you hard-code the assumption that the host and target character > sets are the same. GDB needs to do explicit conversions between the > two as needed, and handle mismatches in some reasonable way. > > GDB already has the commands 'set host-charset' and 'set > target-charset', so you can assume that you have accurate information > about the character sets at hand. They fall back to ASCII. Good, but you need to separately set host-charset for char* and for wchar_t*. The first can be KOI8-R and the second can be UTF-32 in the same program at the same time. > > > The best approach would be to extend charset.[ch] to handle wide > > > character sets as well, and then add code to the language-specific > > > printing routines to use the charset functions. (This is fortunately > > > much simpler than adding support for multibyte characters.) > > > > For, for each wchar_t element language-specific code will call > > 'target_wchar_t_to_host', that will output specific representation of > > that wchar_t. Hmm, the interface there seem to assume theres 1<->1 > > mapping between target and host characters. This makes L"UTF8" format > > and ascii string with \u escapes format impossible, It seems. > > Not at all. The current character and string printing code uses those > routines, and it handles unprintable and invalid characters just fine. > See, for example, host_print_char_literally, and > c_target_char_has_backslash_escape. Can this code output using UTF8-encoding? Consider this code from c-lang.c: static void c_emit_char (int c, struct ui_file *stream, int quoter) { const char *escape; int host_char; c &= 0xFF; /* Avoid sign bit follies */ escape = c_target_char_has_backslash_escape (c); if (escape) { if (quoter == '"' && strcmp (escape, "0") == 0) /* Print nulls embedded in double quoted strings as \000 to prevent ambiguity. */ fprintf_filtered (stream, "\\000"); else fprintf_filtered (stream, "\\%s", escape); } else if (target_char_to_host (c, &host_char) && host_char_print_literally (host_char)) { if (host_char == '\\' || host_char == quoter) fputs_filtered ("\\", stream); fprintf_filtered (stream, "%c", host_char); } else fprintf_filtered (stream, "\\%.3o", (unsigned int) c); } With UTF8 host encoding, we'd want up to 6 host bytes to be output for a single wchar_t, without escaping. The size of 'host_char' here is 4 bytes, so there's no way for 'target_char_to_host' to produce 6 characters. > As far as 1-to-1 mappings are concerned, the only necessary property > is that host_char_to_target and target_char_to_host be inverses, and > return zero for characters that can't make a round trip. The existing > string-printing code will automatically use numeric escapes for > characters that target_char_to_host won't translate. So, assuming numeric escapes are fine with me, I'd need to: 1. Add a way to specify encoding of wchar_t* values. 2. Write a version of c_printstr that will handle wchar_t*. The current version just accesses i-th element of the string, so won't work with UTF-16. 3. Make new c_printstr call LA_PRINT_CHAR just like c_printstr, and that will handle escapes automatically. 4. Make sure new version of c_printstr is invoked for wchar_t* values. Is that about right? - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 7:58 ` Vladimir Prus 2006-04-14 8:07 ` Jim Blandy @ 2006-04-14 8:57 ` Eli Zaretskii 2006-04-14 12:52 ` Vladimir Prus 1 sibling, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 8:57 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Fri, 14 Apr 2006 10:10:19 +0400 > > The problem is that I don't see any way how gdb can print wchar_t in a way > that does not require post-processing. It can print it as UTF8, but then > for printing char* gdb should use local 8 bit encoding, which is likely to > be *not* UTF8. You are talking about a GUI front-end, aren't you? In that case, you will need to code a routine that accepts a wchar_t string, and then _displays_ it using the appropriate font. It is wrong to talk about ``printing'' it and about ``local 8-bit encoding'', because you don't want to encode it, you want to display it using the appropriate font. In particular, if the original wchar_t uses Unicode codepoints, then presumably there should be some GUI API call, specific to your windowing system, that would accept such a wchar_t string and display it using a Unicode font. So if you are going to do this in the front-end, I think all you need is ask GDB to supply the wchar_t string using the array notation; the rest will have to be done inside the front-end. Am I missing something? > Gdb can probably use some extra markers for values: like: > > "foo" for string in local 8-bit encoding > L"foo" for string in UTF8 encoding. > > It's also possible to use "\u" escapes. Why do you need any of these? 16-bit Unicode characters are just integers, so ask GDB to send them as integers. That should be all you need, since displaying them is something your FE will need to do itself, no? > But then there's a problem: > > - Do we assume that wchar_t is always UTF-16 or UTF-32? You don't need to assume, you can ask the application. Wouldn't "sizeof(wchar_t)" do the trick? > - how user-specified encoding will be handled wchar_t is not an encoding, it's the characters' codes themselves. Encoded characters are (in general) multibyte character strings, not wchar_t. See, for example, the description of library functions mbsinit, mbrlen, mbrtowc, etc., for more about this distinction. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 8:57 ` Eli Zaretskii @ 2006-04-14 12:52 ` Vladimir Prus 2006-04-14 13:07 ` Daniel Jacobowitz 2006-04-14 14:16 ` Eli Zaretskii 0 siblings, 2 replies; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 12:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gdb On Friday 14 April 2006 12:43, Eli Zaretskii wrote: > > From: Vladimir Prus <ghost@cs.msu.su> > > Date: Fri, 14 Apr 2006 10:10:19 +0400 > > > > The problem is that I don't see any way how gdb can print wchar_t in a > > way that does not require post-processing. It can print it as UTF8, but > > then for printing char* gdb should use local 8 bit encoding, which is > > likely to be *not* UTF8. > > You are talking about a GUI front-end, aren't you? In that case, you > will need to code a routine that accepts a wchar_t string, and then > _displays_ it using the appropriate font. It is wrong to talk about > ``printing'' it and about ``local 8-bit encoding'', because you don't > want to encode it, you want to display it using the appropriate font. > > In particular, if the original wchar_t uses Unicode codepoints, then > presumably there should be some GUI API call, specific to your > windowing system, that would accept such a wchar_t string and display > it using a Unicode font. Sure, I know how to display Unicode string. The question is how to get at pass raw Unicode data from gdb to frontend in the form suitable for me and most reasonable to other users of gdb. As I said, I already have a user-defined command to do this, but it won't benefit other users of gdb. > So if you are going to do this in the front-end, I think all you need > is ask GDB to supply the wchar_t string using the array notation; the > rest will have to be done inside the front-end. Am I missing > something? Yes, I'll need to know the length of the string. I can do this either using user-defined gdb command (which again will solve *my* problem, but be a local solution), or by looking at each character until I see zero, in which case I'd need to command for each characters. > > > Gdb can probably use some extra markers for values: like: > > > > "foo" for string in local 8-bit encoding > > L"foo" for string in UTF8 encoding. > > > > It's also possible to use "\u" escapes. > > Why do you need any of these? 16-bit Unicode characters are just > integers, so ask GDB to send them as integers. That should be all you > need, since displaying them is something your FE will need to do > itself, no? In an original post, I've asked if gdb can print wchar_t just as a raw sequence of values, like this: 0x56, 0x1456 "foo" and L"foo" are other alternatives which might be more handy for general users of gdb. > > But then there's a problem: > > > > - Do we assume that wchar_t is always UTF-16 or UTF-32? > > You don't need to assume, you can ask the application. Wouldn't > "sizeof(wchar_t)" do the trick? Deciding if it's UTF-16 or UTF-32 is not the problem. In fact, exactly the same code will handle both encodings just fine. The question if we allow encodings which are not UTF-16 or UTF-32. I don't know about any such encodings, but I'm not an i18n expert. > > - how user-specified encoding will be handled > > wchar_t is not an encoding, it's the characters' codes themselves. I don't understand what you say here, sorry. Do you mean that each wchar_t is in general code point, not a complete abstract character. Yes, true, and what? If wchar_t* literals can use encoding other then UTF-16 and UTF-32, you need the code to handle that encoding, and the question arises where you'll get that code, will it be iconv or something else. > Encoded characters are (in general) multibyte character strings, not > wchar_t. See, for example, the description of library functions > mbsinit, mbrlen, mbrtowc, etc., for more about this distinction. I know about this distinction. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 12:52 ` Vladimir Prus @ 2006-04-14 13:07 ` Daniel Jacobowitz 2006-04-14 14:23 ` Eli Zaretskii 2006-04-14 14:16 ` Eli Zaretskii 1 sibling, 1 reply; 52+ messages in thread From: Daniel Jacobowitz @ 2006-04-14 13:07 UTC (permalink / raw) To: Vladimir Prus; +Cc: Eli Zaretskii, gdb On Fri, Apr 14, 2006 at 12:57:41PM +0400, Vladimir Prus wrote: > > So if you are going to do this in the front-end, I think all you need > > is ask GDB to supply the wchar_t string using the array notation; the > > rest will have to be done inside the front-end. Am I missing > > something? > > Yes, I'll need to know the length of the string. I can do this either using > user-defined gdb command (which again will solve *my* problem, but be a local > solution), or by looking at each character until I see zero, in which case > I'd need to command for each characters. Going away from GDB support for wide characters for a moment, and back to this; we have a "print N elements" notation; should we extend it to a "print all non-zero elements" notation? Alternatively, we could do it specially by recognizing wchar_t, but I think the general solution might be more useful. A user defined command for this isn't all that bad, though. You can hopefully define the user command from your frontend. I haven't tested this much, but I don't see a reason why it shouldn't work. If you use define through -interpreter-exec you get CLI prompts back; ugh, that's nasty. If you try this: -interpreter-exec console "define foo\nend" It gets treated as junk. Should we make multi-line strings work in -interpreter-exec? > Deciding if it's UTF-16 or UTF-32 is not the problem. In fact, exactly the > same code will handle both encodings just fine. The question if we allow > encodings which are not UTF-16 or UTF-32. I don't know about any such > encodings, but I'm not an i18n expert. Eli'd know better than me, but I think that expecting wchar_t to be Unicode is not reliable. The glibc manual suggests that it's valid to use other encodings for wchar_t, although ISO 10646 is typical. -- Daniel Jacobowitz CodeSourcery ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 13:07 ` Daniel Jacobowitz @ 2006-04-14 14:23 ` Eli Zaretskii 2006-04-14 14:29 ` Daniel Jacobowitz 0 siblings, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 14:23 UTC (permalink / raw) To: Vladimir Prus, gdb > Date: Fri, 14 Apr 2006 09:05:27 -0400 > From: Daniel Jacobowitz <drow@false.org> > Cc: Eli Zaretskii <eliz@gnu.org>, gdb@sources.redhat.com > > Going away from GDB support for wide characters for a moment, and back to > this; we have a "print N elements" notation; should we extend it to a > "print all non-zero elements" notation? How about "print elements until you find X", where X is any 8-bit code, including zero? That would useful in situations, I think. We will probably need some user-settable limit for the max number of elements, to avoid running amok in case there's no X. > Alternatively, we could do it specially by recognizing wchar_t, but > I think the general solution might be more useful. I agree. > Eli'd know better than me, but I think that expecting wchar_t to be > Unicode is not reliable. I think we cannot assume Unicode is the only character set, but we can make Unicode the default and let the user say otherwise if not. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:23 ` Eli Zaretskii @ 2006-04-14 14:29 ` Daniel Jacobowitz 2006-04-14 14:53 ` Eli Zaretskii 2006-04-14 17:55 ` Jim Blandy 0 siblings, 2 replies; 52+ messages in thread From: Daniel Jacobowitz @ 2006-04-14 14:29 UTC (permalink / raw) To: gdb On Fri, Apr 14, 2006 at 05:08:17PM +0300, Eli Zaretskii wrote: > > Date: Fri, 14 Apr 2006 09:05:27 -0400 > > From: Daniel Jacobowitz <drow@false.org> > > Cc: Eli Zaretskii <eliz@gnu.org>, gdb@sources.redhat.com > > > > Going away from GDB support for wide characters for a moment, and back to > > this; we have a "print N elements" notation; should we extend it to a > > "print all non-zero elements" notation? > > How about "print elements until you find X", where X is any 8-bit > code, including zero? That would useful in situations, I think. Well, I suppose. But in the general case, there's always user-defined functions, and hopefully better scripting languages in the future; is this something that will be frequently useful direct from the command line? It'll involve another extension to the language expression parsers, you see. We ought to minimize such extensions; e.g. the set of operators available is fairly limited. I was thinking "print *ptr@@", by analogy to "print *ptr@5". Or we could use the existing @ N syntax. Right now we issue errors for anything less than one; so how about "print *ptr@0" for "print *ptr until you encounter a zero"? > We will probably need some user-settable limit for the max number of > elements, to avoid running amok in case there's no X. We can just use the "set print elements" limit for that. Although, it's always bugged me that we use the same setting for "number of members of an array" and "number of characters in a string"; I usually want only a few elements of an array, but much more of a string. Maybe someday we should separate them. > I think we cannot assume Unicode is the only character set, but we can > make Unicode the default and let the user say otherwise if not. Seems reasonable to me. -- Daniel Jacobowitz CodeSourcery ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:29 ` Daniel Jacobowitz @ 2006-04-14 14:53 ` Eli Zaretskii 2006-04-14 17:10 ` Daniel Jacobowitz 2006-04-14 17:55 ` Jim Blandy 1 sibling, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 14:53 UTC (permalink / raw) To: gdb > Date: Fri, 14 Apr 2006 10:16:40 -0400 > From: Daniel Jacobowitz <drow@false.org> > > > How about "print elements until you find X", where X is any 8-bit > > code, including zero? That would useful in situations, I think. > > Well, I suppose. But in the general case, there's always user-defined > functions, and hopefully better scripting languages in the future; > is this something that will be frequently useful direct from the > command line? > > It'll involve another extension to the language expression parsers, you > see. We ought to minimize such extensions; e.g. the set of operators > available is fairly limited. No, that's not what I had in mind. I thought about a command which will set the value of the delimiter, with zero being the default. Then just use the same syntax as what you had in mind for zero-delimited arrays. Does this make sense? > I was thinking "print *ptr@@", by analogy to "print *ptr@5". Looks good to me. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:53 ` Eli Zaretskii @ 2006-04-14 17:10 ` Daniel Jacobowitz 0 siblings, 0 replies; 52+ messages in thread From: Daniel Jacobowitz @ 2006-04-14 17:10 UTC (permalink / raw) To: gdb On Fri, Apr 14, 2006 at 05:47:06PM +0300, Eli Zaretskii wrote: > No, that's not what I had in mind. I thought about a command which > will set the value of the delimiter, with zero being the default. > Then just use the same syntax as what you had in mind for > zero-delimited arrays. > > Does this make sense? It seems like something which would be more useful in the expression than as a global state, but on the other hand, I already made the point that this wouldn't be frequently used. I wouldn't object to such a variable (although I probably wouldn't implement it, either). > > I was thinking "print *ptr@@", by analogy to "print *ptr@5". > > Looks good to me. I was going to suggest *ptr@0 again, but I've remembered that these actually take expressions, not just integers. So @@ sounds good to me, unless anyone knows a language where we can get away with using @ for artificial arrays, but can't steal @@ also. -- Daniel Jacobowitz CodeSourcery ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:29 ` Daniel Jacobowitz 2006-04-14 14:53 ` Eli Zaretskii @ 2006-04-14 17:55 ` Jim Blandy 2006-04-14 18:27 ` Eli Zaretskii 1 sibling, 1 reply; 52+ messages in thread From: Jim Blandy @ 2006-04-14 17:55 UTC (permalink / raw) To: gdb On 4/14/06, Daniel Jacobowitz <drow@false.org> wrote: > I was thinking "print *ptr@@", by analogy to "print *ptr@5". Or we > could use the existing @ N syntax. Right now we issue errors for > anything less than one; so how about "print *ptr@0" for "print *ptr > until you encounter a zero"? I much prefer LVAL@@ to LVAL@0. I don't think it's worth complicating the syntax for searching for a zero terminator in order to allow one to search for an arbitrary terminator. I think that will require more typing in the much more common case, and there are other ways to serve the need to search for arbitrary terminators. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 17:55 ` Jim Blandy @ 2006-04-14 18:27 ` Eli Zaretskii 2006-04-14 18:30 ` Jim Blandy 0 siblings, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 18:27 UTC (permalink / raw) To: Jim Blandy; +Cc: gdb > Date: Fri, 14 Apr 2006 10:18:10 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > > I much prefer LVAL@@ to LVAL@0. Agreed. > I don't think it's worth complicating the syntax for searching for a > zero terminator in order to allow one to search for an arbitrary > terminator. Then how will you find the zero terminator? With wcslen? That is only good for wchar_t strings, not for arbitrary integer arrays. And I thought Daniel was suggesting something more general than just wchar_t arrays. > I think that will require more typing in the much more common case ??? What typing? I suggested an additional command that will set the terminator; after that, it's the same typing as with zero. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 18:27 ` Eli Zaretskii @ 2006-04-14 18:30 ` Jim Blandy 2006-04-14 19:19 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Jim Blandy @ 2006-04-14 18:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gdb On 4/14/06, Eli Zaretskii <eliz@gnu.org> wrote: > > Date: Fri, 14 Apr 2006 10:18:10 -0700 > > From: "Jim Blandy" <jimb@red-bean.com> > > > > I much prefer LVAL@@ to LVAL@0. > > Agreed. > > > I don't think it's worth complicating the syntax for searching for a > > zero terminator in order to allow one to search for an arbitrary > > terminator. > > Then how will you find the zero terminator? With wcslen? That is > only good for wchar_t strings, not for arbitrary integer arrays. And > I thought Daniel was suggesting something more general than just > wchar_t arrays. He is. I am, too. Just search for elements equal to zero. If LVAL's type can't be compared with zero, then you can't use @@ on it. > > I think that will require more typing in the much more common case > > ??? What typing? I suggested an additional command that will set the > terminator; after that, it's the same typing as with zero. Yes. I said, "I don't think it's worth complicating the syntax for searching for a zero terminator...". Providing an additional command to set the terminator doesn't complicate the syntax. You're assuming I was speaking directly to your suggestion, when I was instead simply stating the requirements I think we should meet. That said, I don't even think we should have a separate command for setting the terminating value for @@. I think we should wait until someone has a need for it arising out of a real-life use case, not a design conversation. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 18:30 ` Jim Blandy @ 2006-04-14 19:19 ` Eli Zaretskii 0 siblings, 0 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 19:19 UTC (permalink / raw) To: Jim Blandy; +Cc: gdb > Date: Fri, 14 Apr 2006 11:03:38 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > Cc: gdb@sourceware.org > > > > I don't think it's worth complicating the syntax for searching for a > > > zero terminator in order to allow one to search for an arbitrary > > > terminator. > > > > Then how will you find the zero terminator? With wcslen? That is > > only good for wchar_t strings, not for arbitrary integer arrays. And > > I thought Daniel was suggesting something more general than just > > wchar_t arrays. > > He is. I am, too. Just search for elements equal to zero. How is this different or more complex than searching for elements that are equal to some other constant value? > That said, I don't even think we should have a separate command for > setting the terminating value for @@. I think we should wait until > someone has a need for it arising out of a real-life use case, not a > design conversation. What Daniel suggested didn't come from a clear-cut real-life use-case, either. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 12:52 ` Vladimir Prus 2006-04-14 13:07 ` Daniel Jacobowitz @ 2006-04-14 14:16 ` Eli Zaretskii 2006-04-14 14:50 ` Vladimir Prus 1 sibling, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 14:16 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Fri, 14 Apr 2006 12:57:41 +0400 > Cc: gdb@sources.redhat.com > > > In particular, if the original wchar_t uses Unicode codepoints, then > > presumably there should be some GUI API call, specific to your > > windowing system, that would accept such a wchar_t string and display > > it using a Unicode font. > > Sure, I know how to display Unicode string. The question is how to get at pass > raw Unicode data from gdb to frontend in the form suitable for me and most > reasonable to other users of gdb. I suggested to use array features for that. > In an original post, I've asked if gdb can print wchar_t just as a raw > sequence of values, like this: > > 0x56, 0x1456 The answer is YES. Use array notation, and add a feature to report the length of a wchar_t array. > "foo" and L"foo" are other alternatives which might be more handy for general > users of gdb. L"foo" will not help you here, because the characters in question are not printable. If GDB outputs L"foo" where every character is not printable, you will have the same problem as you have now. > > > But then there's a problem: > > > > > > - Do we assume that wchar_t is always UTF-16 or UTF-32? > > > > You don't need to assume, you can ask the application. Wouldn't > > "sizeof(wchar_t)" do the trick? > > Deciding if it's UTF-16 or UTF-32 is not the problem. Well, you did ask about the distinction. > In fact, exactly the same code will handle both encodings just fine. Again, please don't use encoding when you mean character's codepoint. It's confusing, and runs a risk to obfuscate the problem. See below. > The question if we allow encodings which are not UTF-16 or UTF-32. I > don't know about any such encodings, but I'm not an i18n expert. There are a myriad of encodings, but the only ones that could ever qualify as wchar_t are single-byte (8-bit) encodings that are generally used for Latin languages (and for several others, like Cyrillic and Hebrew). What you need is a way to tell GDB how are the strings represented in the debuggee's wchar_t, and then GDB should convert that representation into something your FE can display. Assuming your FE will be able to display Unicode characters, GDB should convert to Unicode, if the debugge's wchar_t is not Unicode already. There's no universal way for GDB to know what is held in wchar_t by the debuggee, so I think the only reasonable way is for the user to tell that. A reasonable default would be 16-bit Unicode codepoints from the BMP, or 32-bit Unicode codepoints from the entire range of Unicode characters. (I think glibc uses the latter.) > > > - how user-specified encoding will be handled > > > > wchar_t is not an encoding, it's the characters' codes themselves. > > I don't understand what you say here, sorry. Do you mean that each wchar_t is > in general code point, not a complete abstract character. Yes, true, and > what? If wchar_t* literals can use encoding other then UTF-16 and UTF-32, you > need the code to handle that encoding, and the question arises where you'll > get that code, will it be iconv or something else. > > > Encoded characters are (in general) multibyte character strings, not > > wchar_t. See, for example, the description of library functions > > mbsinit, mbrlen, mbrtowc, etc., for more about this distinction. > > I know about this distinction. If you know about this distinction, then you should have no trouble understanding what I said about wchar_t NOT being an encoding. UTF-8 and UTF-16 are multibyte variable-length _encodings_ of Unicode character's _codepoints_. For example, the Cyrillic letter ``small a'' has Unicode codepoint 0x0430, but its UTF-8 encoding is a two-byte sequence 0xD0 0xB0. The codepoint is something you will find in a wchar_t array, while the UTF-8 encoding is something you will find in a multibyte string. Now, the same letter ``small a'' can be encoded in several other ways: for example, its ISO-2022-7bit encoding is 0x1B 0x24 0x2C 0x31 0x28 0x50, its KOI8-r encoding is 0xC1, its ISO-8859-5 encoding is 0xD0, etc. It should be obvious that, of all the encodings, only the fixed-length ones can be used in a wchar_t array (because wchar_t arrays are stateless, while multibyte encodings produce stateful strings, where the beginning of each encoded character cannot be decided without processing all the characters before it). It should also be obvious that using wchar_t for single-byte encodings is not useful (you waste storage). Thus, the only practical use of wchar_t is for character sets that do not fit into a single byte, and for those, all the encodings I know of are variable-length multibyte encodings, which are not suitable for wchar_t, as mentioned above. This is why I said that wchar_t is not used for an encoding (such as ISO-8859-5 or UTF-8 or UTF-16), but for characters' codepoints. It is nowadays almost universally accepted that wchar_t is a Unicode codepoint, the only difference between applications being whether only the first 64K characters (the so-called BMP) are supported by 16-bit wchar_t, or the entire 23-bit range is supported by a 32-bit wchar_t. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:16 ` Eli Zaretskii @ 2006-04-14 14:50 ` Vladimir Prus 2006-04-14 17:18 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-14 14:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gdb On Friday 14 April 2006 17:59, Eli Zaretskii wrote: > > In an original post, I've asked if gdb can print wchar_t just as a raw > > sequence of values, like this: > > > > 0x56, 0x1456 > > The answer is YES. Use array notation, and add a feature to report > the length of a wchar_t array. Ok. > Now, the same letter ``small a'' can be encoded in several other ways: > for example, its ISO-2022-7bit encoding is 0x1B 0x24 0x2C 0x31 0x28 > 0x50, its KOI8-r encoding is 0xC1, its ISO-8859-5 encoding is 0xD0, > etc. It should be obvious that, of all the encodings, only the > fixed-length ones can be used in a wchar_t array (because wchar_t > arrays are stateless, I don't think this statement is backed up by anything. > This is why I said that wchar_t is not used for an encoding (such as > ISO-8859-5 or UTF-8 or UTF-16), but for characters' codepoints. It is > nowadays almost universally accepted that wchar_t is a Unicode > codepoint, Again, can you provide any specific pointers to support that view? > the only difference between applications being whether only > the first 64K characters (the so-called BMP) are supported by 16-bit > wchar_t, or the entire 23-bit range is supported by a 32-bit wchar_t. I believe that on Windows: - wchar_t is 16-bit - wchar_t* values are supposed to be in UTF-16 encoding (see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_9i79.asp Do you disagree with any of the above statements? If not, then it directly follows that a given wchar_t is not a Unicode code point, but a code unit in specific representation (UTF-16), and a given code points takes either one or two code units, that is either one or two wchar_t. This is contrary to your statement that wchar_t is a single code point. Anyway, this is quickly getting off-topic for gdb list, so maybe we should bring this somewhere else. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 14:50 ` Vladimir Prus @ 2006-04-14 17:18 ` Eli Zaretskii 2006-04-14 18:03 ` Jim Blandy 0 siblings, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 17:18 UTC (permalink / raw) To: Vladimir Prus; +Cc: gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Fri, 14 Apr 2006 18:37:25 +0400 > Cc: gdb@sources.redhat.com > > > Now, the same letter ``small a'' can be encoded in several other ways: > > for example, its ISO-2022-7bit encoding is 0x1B 0x24 0x2C 0x31 0x28 > > 0x50, its KOI8-r encoding is 0xC1, its ISO-8859-5 encoding is 0xD0, > > etc. It should be obvious that, of all the encodings, only the > > fixed-length ones can be used in a wchar_t array (because wchar_t > > arrays are stateless, > > I don't think this statement is backed up by anything. > > > This is why I said that wchar_t is not used for an encoding (such as > > ISO-8859-5 or UTF-8 or UTF-16), but for characters' codepoints. It is > > nowadays almost universally accepted that wchar_t is a Unicode > > codepoint, > > Again, can you provide any specific pointers to support that view? I think Robert and myself already explained that in later messages. Feel free to ask specific questions if something is still unclear. > I believe that on Windows: > > - wchar_t is 16-bit > - wchar_t* values are supposed to be in UTF-16 encoding > (see > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_9i79.asp > > Do you disagree with any of the above statements? wchar_t is just an integer type. You can stuff _anything_ into an integer array, but if you put UTF-16 there, each element is no longer a character, it is one of a few 16-bit integers that encode a character. In other words, it's a variant of multibyte strings, except that each element is 16-bit wide. Now, I know that Windows holds 16-bit UTF-16 encodings in wchar_t arrays, but that is not the L"foo" strings of wide characters. In the L"foo" notation, each of the 3 string characters _always_ occupies exactly one wchar_t element, and L"foo"[1] is _always_ the second character of the string. This is not true for UTF-16, as I hope is clear from this discussion. In UTF-16, array[1] is the second 16-bit value that encodes a character, and that character's encoding could need more than 1 16-bit value. > If not, then it directly > follows that a given wchar_t is not a Unicode code point, but a code unit in > specific representation (UTF-16), and a given code points takes either one or > two code units, that is either one or two wchar_t. This is contrary to your > statement that wchar_t is a single code point. My statement was based on the assumption that you are coding for a system where wchar_t is used for complete characters, not for UTF-16 strings. Only in that case, you can talk about ``wide characters'' and about wchar_t being a character. In UTF-16, an arbitrary element of the array might not be a complete character. > Anyway, this is quickly getting off-topic for gdb list, so maybe we should > bring this somewhere else. It _is_ on topic, IMHO, as long as we discuss features to be added to GDB. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 17:18 ` Eli Zaretskii @ 2006-04-14 18:03 ` Jim Blandy 2006-04-14 19:16 ` Eli Zaretskii 2006-04-14 19:53 ` Mark Kettenis 0 siblings, 2 replies; 52+ messages in thread From: Jim Blandy @ 2006-04-14 18:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Vladimir Prus, gdb I think folks are seeing difficult problems where there aren't any. Even if the host character set (that is, the character set GDB is using to communicate with its user, or in its MI communications) is plain, old ASCII, GDB can, without any loss of information, convey the contents of a wide string using an arbitrary target character set via MI to a GUI, using code the GUI must already have. Suppose we have a wide string where wchar_t values are Unicode code points. Suppose our host character set is plain ASCII. Suppose the user's program has a string containing the digits '123', followed by some funky Tibetan characters U+0F04 U+0FCC, followed by the letters 'xyz'. When asked to print that string, GDB should print the following twenty-one ASCII characters: L"123\x0f04\x0fccxyz" Since this is a valid way to write that string in a source program, a user at the GDB command line should understand it. Since consumers of MI information must contain parsers for C values already, they can reliably find the contents of the string. Note that this gets a GUI the contents of the string in the *target* character set. The GUI itself should be responsible for converting target characters to whatever character set it wants to use to present data to its user. Here, GDB's 'host' character set is just the character set used to carry information from GDB to the GUI; it should probably be set to ASCII, just to avoid needless variation. But either way, it's just acting as a medium for values in C source code syntax, and has no bearing on either the character set the target program is using, or the character set the GUI will use to present data to its user. Unicode technical report #17 lays out the terminology the Unicode folks use for all this stuff, with good explanations: http://www.unicode.org/reports/tr17/ According to the ISO C standard, the coding character set used by wchar_t must be a superset of that used by char for members of the basic character set. See ISO/IEC 9899:1999 (E) section 7.17, paragraph 2. So I think it's sufficient for the user to specify the coding character set used by wide characters; that fixes the ccs used for char values. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 18:03 ` Jim Blandy @ 2006-04-14 19:16 ` Eli Zaretskii 2006-04-14 19:22 ` Jim Blandy 2006-04-14 19:53 ` Mark Kettenis 1 sibling, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-14 19:16 UTC (permalink / raw) To: Jim Blandy; +Cc: ghost, gdb > Date: Fri, 14 Apr 2006 10:53:44 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > Cc: "Vladimir Prus" <ghost@cs.msu.su>, gdb@sources.redhat.com > > I think folks are seeing difficult problems where there aren't any. What difficulties? there _are_ no difficulties ;-) > Suppose we have a wide string where wchar_t values are Unicode code > points. Suppose our host character set is plain ASCII. Suppose the > user's program has a string containing the digits '123', followed by > some funky Tibetan characters U+0F04 U+0FCC, followed by the letters > 'xyz'. When asked to print that string, GDB should print the > following twenty-one ASCII characters: > > L"123\x0f04\x0fccxyz" This will work, if we accept your assumptions (which are by no means universally correct, e.g. parts of our discussion were around whether the string contains U+XXXX Unicode codepoints or their UTF-16 encodings). But all you did is invent an encoding (and a variable-size encoding at that). Something in the GUI FE still has to interpret that encoding, i.e. convert it back to binary representation of the characters, because your encoding cannot be displayed by any known GUI API. Compare this with the facility that we already have today: (gdb) print *warray@8 {0x0031, 0x0032, 0x0033, 0x0F04, 0x0FCC, 0x0078, 0x0079, 0x007A} Except for using up 60-odd characters where you used 21, this is IMHO better, since it doesn't require any code on the FE side: just convert the strings to integers, and you've got Unicode, ready to be used for whatever purposes. > Since this is a valid way to write that string in a source program, a > user at the GDB command line should understand it. Since consumers of > MI information must contain parsers for C values already, they can > reliably find the contents of the string. I only partly agree with the first sentence, and not at all with the second. For the interactive user, understanding non-ASCII strings in the suggested ASCII encoding might not be easy at all. For example, for all my knowledge of Hebrew, if someone shows me \x05D2, I will have hard time recognizing the letter Gimel. As for the second sentence, ``reliably find the contents of the string'' there obviously doesn't consider the complexities of handling wide characters. In my experience, for any non-trivial string processing, working with variable-size encoding is much harder than with fixed-size wchar_t arrays, because you need to interpret the bytes as you go, even if all you need is to find the n-th character. Even the simple task of computing the number of characters in the string becomes complicated. > Note that this gets a GUI the contents of the string in the *target* > character set. The GUI itself should be responsible for converting > target characters to whatever character set it wants to use to present > data to its user. Here, GDB's 'host' character set is just the > character set used to carry information from GDB to the GUI; it should > probably be set to ASCII, just to avoid needless variation. But > either way, it's just acting as a medium for values in C source code > syntax, and has no bearing on either the character set the target > program is using, or the character set the GUI will use to present > data to its user. What you are suggesting is simple for GDB, but IMHo leaves too much complexity to the FE. I think GDB could do better. In particular, if I'm sitting at a UTF-8 enabled xterm, I'd be grateful if GDB would show me Unicode characters in their normal glyphs, which would require GDB to output the characters in their UTF-8 encoding (which the terminal will then display in human-readable form). Your suggestion doesn't allow such a feature, AFAICS, at least not for CLI users. That said, if someone volunteers to do the job of adding your suggestions to GDB, I won't object to accepting the patches, because whoever does the job gets to choose the tools. > Unicode technical report #17 lays out the terminology the Unicode > folks use for all this stuff, with good explanations: > http://www.unicode.org/reports/tr17/ Yes, that's a good background reading for related stuff. > According to the ISO C standard, the coding character set used by > wchar_t must be a superset of that used by char for members of the > basic character set. See ISO/IEC 9899:1999 (E) section 7.17, > paragraph 2. So I think it's sufficient for the user to specify the > coding character set used by wide characters; that fixes the ccs used > for char values. If wchar_t uses fixed-size characters, not their variable-size encodings, then specifying the CCS will do. Encodings are another matter; as I wrote earlier, there could be many different encodings of the same CCS, and I suppose some weirdo software somewhere could stuff such encoding into a wchar_t. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 19:16 ` Eli Zaretskii @ 2006-04-14 19:22 ` Jim Blandy 2006-04-14 22:18 ` Daniel Jacobowitz 2006-04-15 7:14 ` Eli Zaretskii 0 siblings, 2 replies; 52+ messages in thread From: Jim Blandy @ 2006-04-14 19:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ghost, gdb On 4/14/06, Eli Zaretskii <eliz@gnu.org> wrote: > > Suppose we have a wide string where wchar_t values are Unicode code > > points. Suppose our host character set is plain ASCII. Suppose the > > user's program has a string containing the digits '123', followed by > > some funky Tibetan characters U+0F04 U+0FCC, followed by the letters > > 'xyz'. When asked to print that string, GDB should print the > > following twenty-one ASCII characters: > > > > L"123\x0f04\x0fccxyz" > > This will work, if we accept your assumptions (which are by no means > universally correct, e.g. parts of our discussion were around whether > the string contains U+XXXX Unicode codepoints or their UTF-16 > encodings). But all you did is invent an encoding (and a > variable-size encoding at that). Something in the GUI FE still has to > interpret that encoding, i.e. convert it back to binary representation > of the characters, because your encoding cannot be displayed by any > known GUI API. The command line and MI already use the ISO C syntax for conveying values to the user/consumer. I'm just saying we should expand our use of the syntax we already use. I posited that the target character set was Unicode, but the same mechanism will work no matter what character set and encoding the target uses. No matter what string appears on the target, there is always a source-language representation for that target. According to ISO C, the \x escapes specify char or wchar_t values in the target character set. So you can always write whatever you've got. > Compare this with the facility that we already have today: > > (gdb) print *warray@8 > {0x0031, 0x0032, 0x0033, 0x0F04, 0x0FCC, 0x0078, 0x0079, 0x007A} > > Except for using up 60-odd characters where you used 21, this is IMHO > better, since it doesn't require any code on the FE side: just convert > the strings to integers, and you've got Unicode, ready to be used for > whatever purposes. If you're printing an expression that evaluates to a string, sure. But what if you're printing a value of type struct { wchar *key; wchar_t *value }? What if you're using -stack-list-arguments to show values in a stack frame? My point is, MI consumers are already parsing ISO C strings. They just need to parse more of them. > > Since this is a valid way to write that string in a source program, a > > user at the GDB command line should understand it. Since consumers of > > MI information must contain parsers for C values already, they can > > reliably find the contents of the string. > > I only partly agree with the first sentence, and not at all with the > second. > > For the interactive user, understanding non-ASCII strings in the > suggested ASCII encoding might not be easy at all. For example, for > all my knowledge of Hebrew, if someone shows me \x05D2, I will have > hard time recognizing the letter Gimel. If the host character set includes Gimel, then GDB won't print it with a hex escape. > As for the second sentence, ``reliably find the contents of the > string'' there obviously doesn't consider the complexities of handling > wide characters. In my experience, for any non-trivial string > processing, working with variable-size encoding is much harder than > with fixed-size wchar_t arrays, because you need to interpret the > bytes as you go, even if all you need is to find the n-th character. > Even the simple task of computing the number of characters in the > string becomes complicated. I don't understand what you mean. The rules for parsing ISO C string literals into arrays of chars and wide string literals into arrays of wide characters are straightforward. > What you are suggesting is simple for GDB, but IMHo leaves too much > complexity to the FE. I think GDB could do better. In particular, if > I'm sitting at a UTF-8 enabled xterm, I'd be grateful if GDB would > show me Unicode characters in their normal glyphs, which would require > GDB to output the characters in their UTF-8 encoding (which the > terminal will then display in human-readable form). Your suggestion > doesn't allow such a feature, AFAICS, at least not for CLI users. When the host character set contains a character, there's no need for GDB to use an escape to show it. > If wchar_t uses fixed-size characters, not their variable-size > encodings, then specifying the CCS will do. There is no provision in ISO C for variable-size wchar_t encodings. The portion of the standard I referred to says that wchar_t "...is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales". ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 19:22 ` Jim Blandy @ 2006-04-14 22:18 ` Daniel Jacobowitz 2006-04-16 11:39 ` Jim Blandy 2006-04-15 7:14 ` Eli Zaretskii 1 sibling, 1 reply; 52+ messages in thread From: Daniel Jacobowitz @ 2006-04-14 22:18 UTC (permalink / raw) To: Jim Blandy; +Cc: Eli Zaretskii, ghost, gdb On Fri, Apr 14, 2006 at 12:16:36PM -0700, Jim Blandy wrote: > The command line and MI already use the ISO C syntax for conveying > values to the user/consumer. I'm just saying we should expand our use > of the syntax we already use. I don't agree. Saying "we use ISO C syntax for conveying data" is fairly inaccurate. We are inconsistent. Some things are escaped in a C-like fashion. Other things are escaped in other fashions, with their own quoting rules. This is true in both directions, for user input and for output. Let's consider strings in particular. Strings are printed using LA_PRINT_STRING. As the name implies, the quoting done is adjusted to match the source language convention. Asking an FE to grok that is just impractical. In data intended for CLI users, we can prettyprint things any way we want; in data intended for anything more machinelike, I recommend we define a syntax and stick with it. Personally, I'd just use UTF-8. If you want GDB's output, expect it to be UTF-8. The MI layer is a "transport", and can add its own necessary escaping (of quote marks, mostly). Alternatively, make GDB output in the current locale's character set. So, if we print a wchar_t string as a string, and the user has conveyed to us that their wchar_t strings are Unicode code points, then we can convert that to the appropriate multibyte string on output using the host character set. Picked a host character set that can't represent some target characters? The CLI should fall back to pretty escape sequences, I don't know what the MI should do, but probably the answer is unimportant. > My point is, MI consumers are already parsing ISO C strings. They > just need to parse more of them. IMO, we need to make them parse less of them. Everywhere the MI consumer needs to parse something which originated as GDB CLI output, things go bad. For instance, MI consumers may get confused by the automatic limits on "set print elements", which truncates strings. After "set print elements 2": (gdb) interpreter-exec mi "-var-create - * \"(char *)&__libc_version\"" ^done,name="var1",numchild="1",type="char *" (gdb) (gdb) interpreter-exec mi "-var-evaluate-expression var1" ^done,value="0x102a80 \"2.\"..." (gdb) Not very nice of us, was that? > There is no provision in ISO C for variable-size wchar_t encodings. > The portion of the standard I referred to says that wchar_t "...is an > integer type whose range of values can represent distinct codes for > all members of the largest extended character set speci???ed among the > supported locales". (A) GDB supports languages other than C. (B) While I am inclined to agree with you about the language of ISO C, we don't get to ignore the reality of platforms with a 16-bit wchar_t which store UTF-16 in it. -- Daniel Jacobowitz CodeSourcery ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 22:18 ` Daniel Jacobowitz @ 2006-04-16 11:39 ` Jim Blandy 2006-04-16 15:07 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Jim Blandy @ 2006-04-16 11:39 UTC (permalink / raw) To: Jim Blandy, Eli Zaretskii, ghost, gdb As far as conveying strings accurately to GUI's via MI is concerned: It's fine to improve the way MI conveys data to the front end. It seems to me we still need to do things like repetition elimination and length limiting, but that syntax should certainly be designed to make the front ends' life easier. I'm not so sure about GDB doing character set conversion. I think I'd rather see GDB concentrate on accurately and safely conveying target code points to the front end, and make the front end responsible for displaying it. If the front end hasn't asked GDB to "print" the value in GDB's own way, then the front end has accepted responsibility for presentation, it seems to me. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-16 11:39 ` Jim Blandy @ 2006-04-16 15:07 ` Eli Zaretskii 0 siblings, 0 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-16 15:07 UTC (permalink / raw) To: Jim Blandy; +Cc: ghost, gdb > Date: Fri, 14 Apr 2006 15:18:50 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > > As far as conveying strings accurately to GUI's via MI is concerned: > > It's fine to improve the way MI conveys data to the front end. It > seems to me we still need to do things like repetition elimination and > length limiting, but that syntax should certainly be designed to make > the front ends' life easier. Do you agree that the array@@ feature suggested by Daniel is a step in the right direction? > I'm not so sure about GDB doing character set conversion. I think I'd > rather see GDB concentrate on accurately and safely conveying target > code points to the front end, and make the front end responsible for > displaying it. I'd rather see GDB offering something in this area as well, but until we have a volunteer for this job, this disagreement is academic. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 19:22 ` Jim Blandy 2006-04-14 22:18 ` Daniel Jacobowitz @ 2006-04-15 7:14 ` Eli Zaretskii 2006-04-17 7:16 ` Vladimir Prus 1 sibling, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-15 7:14 UTC (permalink / raw) To: Jim Blandy; +Cc: ghost, gdb > Date: Fri, 14 Apr 2006 12:16:36 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > Cc: ghost@cs.msu.su, gdb@sources.redhat.com > > > (gdb) print *warray@8 > > {0x0031, 0x0032, 0x0033, 0x0F04, 0x0FCC, 0x0078, 0x0079, 0x007A} > > > > Except for using up 60-odd characters where you used 21, this is IMHO > > better, since it doesn't require any code on the FE side: just convert > > the strings to integers, and you've got Unicode, ready to be used for > > whatever purposes. > > If you're printing an expression that evaluates to a string, sure. > But what if you're printing a value of type struct { wchar *key; > wchar_t *value }? What if you're using -stack-list-arguments to show > values in a stack frame? Sorry, I don't see the difference. Perhaps I'm too dense. Are you talking about the amount of ASCII characters, or something else? > My point is, MI consumers are already parsing ISO C strings. They > just need to parse more of them. This ``more parsing'' is not magic. It's a lot of work, in general. > > For the interactive user, understanding non-ASCII strings in the > > suggested ASCII encoding might not be easy at all. For example, for > > all my knowledge of Hebrew, if someone shows me \x05D2, I will have > > hard time recognizing the letter Gimel. > > If the host character set includes Gimel, then GDB won't print it with > a hex escape. The host character set has nothing to do, in general, with what characters can be displayed. The same host character set can be displayed on an appropriately localized xterm, but not on a bare-bones character terminal. Not every system that runs in the Hebrew locale has Hebrew-enabled xterm. Some characters may be missing from a particular font, especially a Unicode-based font (because there so many Unicode characters). Etc., etc. Even if I do have a Hebrew enabled xterm, chances are that it cannot display characters sent in 16-bit Unicode codepoints, it will want some single-byte encoding, like UTF-8 or maybe ISO 8859-8. GDB will generally know nothing about these complications, unless we teach it. For example, to display Hebrew letters on a UTF-8 enabled xterm, we (i.e. the user, through appropriate GDB commands) will have to tell GDB that wchar_t strings should be encoded in UTF-8 by the CLI output routines. Sometimes these settings can be gleaned from the environment variables, but Emacs's experience shows how very unreliable and error-prone this is. > > As for the second sentence, ``reliably find the contents of the > > string'' there obviously doesn't consider the complexities of handling > > wide characters. In my experience, for any non-trivial string > > processing, working with variable-size encoding is much harder than > > with fixed-size wchar_t arrays, because you need to interpret the > > bytes as you go, even if all you need is to find the n-th character. > > Even the simple task of computing the number of characters in the > > string becomes complicated. > > I don't understand what you mean. The rules for parsing ISO C string > literals into arrays of chars and wide string literals into arrays of > wide characters are straightforward. You seem to assume here that the target and the front-end's character sets and their notion of wchar_t are identical. Otherwise, what was a valid array of wide characters on the target side will be gibberish on the host side, and will certainly not display as anything legible. Unlike GDB core, which just wants to pass the bytes from here to there, the UI needs to be able to display the string, and for that it needs to understand how it is encoded, how many glyphs will it produce on the screen, where it can be broken into several lines if it is too long, etc. This is all trivial with 7-bit ASCII (every byte produces a single glyph, except a few non-printables, whitespace characters signal possible locations to break the line, etc.), but can get very complex with other character sets. GDB cannot be asked to know about all of those complications, but I think it should at least provide a few simple translation services so that a front end will not have to work too hard to handle and display strings as mostly readable text. Passing the characters as fixed-size codepoints expressed as ASCII hex strings leaves the front-end with only very simple job. What's more, it uses an existing feature: array printing. > > What you are suggesting is simple for GDB, but IMHo leaves too much > > complexity to the FE. I think GDB could do better. In particular, if > > I'm sitting at a UTF-8 enabled xterm, I'd be grateful if GDB would > > show me Unicode characters in their normal glyphs, which would require > > GDB to output the characters in their UTF-8 encoding (which the > > terminal will then display in human-readable form). Your suggestion > > doesn't allow such a feature, AFAICS, at least not for CLI users. > > When the host character set contains a character, there's no need for > GDB to use an escape to show it. Whose host character set? GDB's? But GDB is not displaying the strings, the front end is. And as I wrote above, there's no guarantees that the host character set can be transparently displayed on the screen. This only works for ASCII and some simple single-byte encodings, mostly Latin ones. But it doesn't work in general. And why are you talking about host character set? The L"123\x0f04\x0fccxyz" string came from the target, GDB simply converted it to 7-bit ASCII. These are characters from the target character set. And the target doesn't necessarily talk in the host locale's character set and language, you could be debugging a program which talks Farsi with GDB that runs in a German locale. > > If wchar_t uses fixed-size characters, not their variable-size > > encodings, then specifying the CCS will do. > > There is no provision in ISO C for variable-size wchar_t encodings. > The portion of the standard I referred to says that wchar_t "...is an > integer type whose range of values can represent distinct codes for > all members of the largest extended character set specified among the > supported locales". I agree, but Windows and who knows what else violates that. Of course, for the BMP, UTF-16 is indistinguishable from Unicode codepoints, so in practice this might not matter too much. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-15 7:14 ` Eli Zaretskii @ 2006-04-17 7:16 ` Vladimir Prus 2006-04-17 8:58 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-17 7:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Jim Blandy, gdb On Saturday 15 April 2006 01:37, Eli Zaretskii wrote: > > My point is, MI consumers are already parsing ISO C strings. They > > just need to parse more of them. > > This ``more parsing'' is not magic. It's a lot of work, in general. I don't quite get it. Say that frontend and gdb somehow agree on the 8-bit encoding using by gdb to print the strings. Then frontend can look at the string and: - If it sees \x, look at the following hex digits and convert it to either code point or code unit - If it sees anything else, convert it from local 8-bit to Unicode The only question here is whether \x encodes a code unit or code point. If it encodes a code unit, frontend needs extra processing (for me, that's easy). If it encodes code point, then further changes in gdb are needed. Note that due to charset function interface using 'int', you can't use UTF-8 for encoding passed to frontend, but using ASCII + \x is still feasible. There's one nice thing about this approach. If there's new 'print array until XX" syntax, I indeed need to special-case processing of values in several contexts -- most notably arguments in stack trace. With "\x" escapes I'd need to write a code to handle them once. In fact, I can add this code right to MI parser (which operates using Unicode-enabled QString class already). That will be more convenient than invoking 'print array' for any wchar_t* I ever see. > > > For the interactive user, understanding non-ASCII strings in the > > > suggested ASCII encoding might not be easy at all. For example, for > > > all my knowledge of Hebrew, if someone shows me \x05D2, I will have > > > hard time recognizing the letter Gimel. > > > > If the host character set includes Gimel, then GDB won't print it with > > a hex escape. > > The host character set has nothing to do, in general, with what > characters can be displayed. The same host character set can be > displayed on an appropriately localized xterm, but not on a bare-bones > character terminal. Not every system that runs in the Hebrew locale > has Hebrew-enabled xterm. Some characters may be missing from a > particular font, especially a Unicode-based font (because there so > many Unicode characters). Etc., etc. > > Even if I do have a Hebrew enabled xterm, chances are that it cannot > display characters sent in 16-bit Unicode codepoints, it will want > some single-byte encoding, like UTF-8 or maybe ISO 8859-8. > > GDB will generally know nothing about these complications, unless we > teach it. For example, to display Hebrew letters on a UTF-8 enabled > xterm, we (i.e. the user, through appropriate GDB commands) will have > to tell GDB that wchar_t strings should be encoded in UTF-8 by the CLI > output routines. Sometimes these settings can be gleaned from the > environment variables, but Emacs's experience shows how very > unreliable and error-prone this is. I don't quite get. First you say you want \x05D2 to display using Unicode font on console, now you say it's very hard. Now, if you want Unicode display for \x05D2, there should be some method to tell gdb that your console can display Unicode, and if user told that Unicode is supported, what are the problems? > how many glyphs will it produce > on the screen, where it can be broken into several lines if it is too > long, etc. This is all trivial with 7-bit ASCII (every byte produces > a single glyph, except a few non-printables, whitespace characters > signal possible locations to break the line, etc.), but can get very > complex with other character sets. Isn't this completely outside of GDB? In fact, this is also outside of frontend -- GUI toolkit will handle this transparently (and if it won't, it's broken). > GDB cannot be asked to know about all of those complications, but I > think it should at least provide a few simple translation services so > that a front end will not have to work too hard to handle and display > strings as mostly readable text. Passing the characters as fixed-size > codepoints expressed as ASCII hex strings leaves the front-end with > only very simple job. What's more, it uses an existing feature: array > printing. Using \x escapes, provided they encode *code units*, leaves frontend with the same simple job. Really, using strings with \x escapes differs from array printing in just one point: some characters are printed not as hex values, but as characters in local 8-bit encoding. Why do you think this is a problem? I can't see what's wrong with that. > > > What you are suggesting is simple for GDB, but IMHo leaves too much > > > complexity to the FE. I think GDB could do better. In particular, if > > > I'm sitting at a UTF-8 enabled xterm, I'd be grateful if GDB would > > > show me Unicode characters in their normal glyphs, which would require > > > GDB to output the characters in their UTF-8 encoding (which the > > > terminal will then display in human-readable form). Your suggestion > > > doesn't allow such a feature, AFAICS, at least not for CLI users. > > > > When the host character set contains a character, there's no need for > > GDB to use an escape to show it. > > Whose host character set? GDB's? But GDB is not displaying the > strings, the front end is. And as I wrote above, there's no > guarantees that the host character set can be transparently displayed > on the screen. This only works for ASCII and some simple single-byte > encodings, mostly Latin ones. But it doesn't work in general. > > And why are you talking about host character set? The > L"123\x0f04\x0fccxyz" string came from the target, GDB simply > converted it to 7-bit ASCII. These are characters from the target > character set. And the target doesn't necessarily talk in the host > locale's character set and language, you could be debugging a program > which talks Farsi with GDB that runs in a German locale. So, characters that happen to exist in German locale are printed as literal chars. Other characters are printed using \x. FE reads the string, and when it sees literal char, it converts it from German locale to Unicode used internally. Where's the problem? - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-17 7:16 ` Vladimir Prus @ 2006-04-17 8:58 ` Eli Zaretskii 2006-04-17 10:35 ` Vladimir Prus 0 siblings, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-17 8:58 UTC (permalink / raw) To: Vladimir Prus; +Cc: jimb, gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Mon, 17 Apr 2006 10:36:47 +0400 > Cc: "Jim Blandy" <jimb@red-bean.com>, > gdb@sources.redhat.com > > On Saturday 15 April 2006 01:37, Eli Zaretskii wrote: > > > > My point is, MI consumers are already parsing ISO C strings. They > > > just need to parse more of them. > > > > This ``more parsing'' is not magic. It's a lot of work, in general. > > I don't quite get it. Say that frontend and gdb somehow agree on the 8-bit > encoding using by gdb to print the strings. Then frontend can look at the > string and: > > - If it sees \x, look at the following hex digits and convert it to either > code point or code unit > - If it sees anything else, convert it from local 8-bit to Unicode That's what Jim was saying. He thought (or so it seemed to me) that, once the ASCII-encoded string was read by the front end and converted back to the integer values, the job is done. That is, in Jim's example with L"123\x0f04\x0fccxyz", the character `1' is converted to its code 49 decimal, \x0f04 is converted to the 16-bit code 3844 decimal, `x' is converted to 120 decimal, etc. What I was saying that indeed this conversion is easy, but it's not even close to doing what the front end generally would like to do with the string. You want to _process_ the string, which means you want to know its length in characters (not bytes), you want to know what character set they encode, you want to be able to find the n-th character in the string, etc. The encoding suggested by Jim makes these tasks very hard, much harder than if we send the string as an array of fixed-length wide characters. > Note that due to charset function interface using 'int', you can't use UTF-8 > for encoding passed to frontend, but using ASCII + \x is still feasible. I don't understand why UTF-8 cannot be used (an int can hold an 8-bit byte just fine), nor can I see why this is an issue. We are not discussing addition of UTF-8 encoding to GDB, we are discussing how to pass to a front end wide-character strings held within the debuggee. Or at least that's what I thought you were trying to solve. > There's one nice thing about this approach. If there's new 'print array until > XX" syntax, I indeed need to special-case processing of values in several > contexts -- most notably arguments in stack trace. With "\x" escapes I'd need > to write a code to handle them once. In fact, I can add this code right to MI > parser (which operates using Unicode-enabled QString class already). That > will be more convenient than invoking 'print array' for any wchar_t* I ever > see. I don't think we should optimize GDB for one specific toolkit, even if that toolkit is Qt. > I don't quite get. First you say you want \x05D2 to display using Unicode font > on console, now you say it's very hard. No, I said that a GUI front end will be able to display the _binary_ _code_ 0x05D2 with a suitable Unicode font. Jim suggested that seeing the _string_ "\x05D2" in GDB's output will allow me to read the text, to which I replied that it will not be easy at all, since humans generally don't remember Unicode codepoints by heart, even for their native languages. > Now, if you want Unicode display for > \x05D2, there should be some method to tell gdb that your console can display > Unicode, and if user told that Unicode is supported, what are the problems? Please read my other messages: the program being debugged might talk Hebrew in Unicode codepoints, but the locale where we are running GDB might not support Hebrew on the console. So, as long as we are talking about console output (which is different from a GUI front end), just sending Unicode to the display is not enough. I suggest not to mix issues relevant for GUI front ends and text-mode front ends, including the CLI ``front end'' built into GDB itself. These are different issues, each one with its own set of complexities. Jim's L"123\x0f04\x0fccxyz" proposal was (I think) more oriented to text terminals and the CLI, so the discussion wandered off in that direction. I don't think your original problem is related to that. > > how many glyphs will it produce > > on the screen, where it can be broken into several lines if it is too > > long, etc. This is all trivial with 7-bit ASCII (every byte produces > > a single glyph, except a few non-printables, whitespace characters > > signal possible locations to break the line, etc.), but can get very > > complex with other character sets. > > Isn't this completely outside of GDB? No, not completely: the ui_output routines do this for the console output. Again, this part was about text-mode output, and the CLI in particular. > > GDB cannot be asked to know about all of those complications, but I > > think it should at least provide a few simple translation services so > > that a front end will not have to work too hard to handle and display > > strings as mostly readable text. Passing the characters as fixed-size > > codepoints expressed as ASCII hex strings leaves the front-end with > > only very simple job. What's more, it uses an existing feature: array > > printing. > > Using \x escapes, provided they encode *code units*, leaves frontend with the > same simple job. Yes, but GDB will need to generate the code units first, e.g. convert fixed-size Unicode wide characters into UTF-8. That's extra job for GDB. (Again, we were originally talking about wchar_t, not multibyte strings.) > Really, using strings with \x escapes differs from array > printing in just one point: some characters are printed not as hex values, > but as characters in local 8-bit encoding. Why do you think this is a > problem? Because knowing what is the ``local 8-bit encoding'' is in itself a huge problem. Emacs is trying to solve it since 1996, and it still haven't got all the details right in some marginal cases, although we have people on the Emacs development team who understand more about i18n than I ever will. In short, there's no reliable method of finding out what is the correct 8-bit encoding in which to talk to any given text-mode display. And you certainly do NOT want any local 8-bit encodings when you are going to display the string on a GUI, because that would require that the front end does some extra job of converting the encoded text back to what it needs to communicate with the text widgets. > > And why are you talking about host character set? The > > L"123\x0f04\x0fccxyz" string came from the target, GDB simply > > converted it to 7-bit ASCII. These are characters from the target > > character set. And the target doesn't necessarily talk in the host > > locale's character set and language, you could be debugging a program > > which talks Farsi with GDB that runs in a German locale. > > So, characters that happen to exist in German locale are printed as literal > chars. Other characters are printed using \x. FE reads the string, and when > it sees literal char, it converts it from German locale to Unicode used > internally. Where's the problem? If this conversion is lossless, it's redundant. It is easier to just send everything as hex escapes, since no human will see them, only the FE. This saves the needless conversion (and potential problems with incorrect notion of the current locale and encoding). But some conversions to ``literal characters'' (i.e. to 8-bit binary codes) are lossy, because the underlying converter needs state information to correctly interpret the byte stream. This state information is thrown away once the conversion is done, and so the opposite conversion fails to reconstruct the original codepoints. This is usually the case with ISO-2022 encodings. So I think on balance it's better to send the original wide characters as hex, the only downside being that it uses more bytes per character. (Again, this is about GUI front ends, not about GDB's own CLI output routines.) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-17 8:58 ` Eli Zaretskii @ 2006-04-17 10:35 ` Vladimir Prus 2006-04-17 12:26 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-17 10:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: jimb, gdb On Monday 17 April 2006 12:35, Eli Zaretskii wrote: > > - If it sees \x, look at the following hex digits and convert it to > > either code point or code unit > > - If it sees anything else, convert it from local 8-bit to Unicode > > That's what Jim was saying. He thought (or so it seemed to me) that, > once the ASCII-encoded string was read by the front end and converted > back to the integer values, the job is done. That is, in Jim's > example with L"123\x0f04\x0fccxyz", the character `1' is converted to > its code 49 decimal, \x0f04 is converted to the 16-bit code 3844 > decimal, `x' is converted to 120 decimal, etc. > > What I was saying that indeed this conversion is easy, but it's not > even close to doing what the front end generally would like to do with > the string. You want to _process_ the string, which means you want to > know its length in characters (not bytes), you want to know what > character set they encode, you want to be able to find the n-th > character in the string, etc. The encoding suggested by Jim makes > these tasks very hard, much harder than if we send the string as an > array of fixed-length wide characters. That's a *completely* different topic. First, frontend needs to get the data, in whatever form. Using \x escapes is just as suitable as using list of hex values -- those approaches are just isomorphic. Second, frontend needs to display the data, however it will operate using its own data structures, and it does not matter if \x escapes were used or not. No frontend will ever work on a string containing embedded "\x" escapes. > > Note that due to charset function interface using 'int', you can't use > > UTF-8 for encoding passed to frontend, but using ASCII + \x is still > > feasible. > > I don't understand why UTF-8 cannot be used (an int can hold an 8-bit > byte just fine), Int can't hold 6 bytes, at least on common machines. And interface is charset.h requires that result of conversion of one host character to one target character fit into int. Anyway, I don't think charset.h was designed with Unicode in mind, so we probably should stop dicussing it. > > There's one nice thing about this approach. If there's new 'print array > > until XX" syntax, I indeed need to special-case processing of values in > > several contexts -- most notably arguments in stack trace. With "\x" > > escapes I'd need to write a code to handle them once. In fact, I can add > > this code right to MI parser (which operates using Unicode-enabled > > QString class already). That will be more convenient than invoking 'print > > array' for any wchar_t* I ever see. > > I don't think we should optimize GDB for one specific toolkit, even if > that toolkit is Qt. Replace QString with Gtkmm::ustring and the same argument holds. Whenever string type is used inside frontend to represent Unicode string, you can perform the conversion from \x escapes to that string class in one place, and don't do this separately, inside variable display widget, inside stack display widget and where not. > > I don't quite get. First you say you want \x05D2 to display using Unicode > > font on console, now you say it's very hard. > > No, I said that a GUI front end will be able to display the _binary_ > _code_ 0x05D2 with a suitable Unicode font. Jim suggested that seeing > the _string_ "\x05D2" in GDB's output will allow me to read the text, > to which I replied that it will not be easy at all, since humans > generally don't remember Unicode codepoints by heart, even for their > native languages. Ok, seeing the string "\x05D2" will be sufficient for frontend. > > > GDB cannot be asked to know about all of those complications, but I > > > think it should at least provide a few simple translation services so > > > that a front end will not have to work too hard to handle and display > > > strings as mostly readable text. Passing the characters as fixed-size > > > codepoints expressed as ASCII hex strings leaves the front-end with > > > only very simple job. What's more, it uses an existing feature: array > > > printing. > > > > Using \x escapes, provided they encode *code units*, leaves frontend with > > the same simple job. > > Yes, but GDB will need to generate the code units first, e.g. convert > fixed-size Unicode wide characters into UTF-8. Sorry, where does that UTF-8 comes from? If you generate ASCII + \x escapes, you don't need UTF-8. > That's extra job for > GDB. (Again, we were originally talking about wchar_t, not multibyte > strings.) I don't understand what's this extra job. This is as simple as: for c in wchar_t* literal: if c is representable in host encoding: output_literal else output_hex_escape > > Really, using strings with \x escapes differs from array > > printing in just one point: some characters are printed not as hex > > values, but as characters in local 8-bit encoding. Why do you think this > > is a problem? > > Because knowing what is the ``local 8-bit encoding'' is in itself a > huge problem. Emacs is trying to solve it since 1996, and it still > haven't got all the details right in some marginal cases, although we > have people on the Emacs development team who understand more about > i18n than I ever will. In short, there's no reliable method of > finding out what is the correct 8-bit encoding in which to talk to any > given text-mode display. I trust you on that, but nothing prevents user/frontend to explicitly specify the encoding. > And you certainly do NOT want any local 8-bit encodings when you are > going to display the string on a GUI, because that would require that > the front end does some extra job of converting the encoded text back > to what it needs to communicate with the text widgets. I would expect that any GUI toolkit that pretend to support Unicode *has* to support conversion from local 8 bit encodings. Otherwise, such toolkit is of no use in real world. By the way, unless your target encoding is ASCII, frontend has to be aware of local 8 bit encoding anyway. If I wrote program using KOI8-R and frontend shows the char* (not wchar_t*) strings as ASCII, the frontend is broken already. > > > And why are you talking about host character set? The > > > L"123\x0f04\x0fccxyz" string came from the target, GDB simply > > > converted it to 7-bit ASCII. These are characters from the target > > > character set. And the target doesn't necessarily talk in the host > > > locale's character set and language, you could be debugging a program > > > which talks Farsi with GDB that runs in a German locale. > > > > So, characters that happen to exist in German locale are printed as > > literal chars. Other characters are printed using \x. FE reads the > > string, and when it sees literal char, it converts it from German locale > > to Unicode used internally. Where's the problem? > > If this conversion is lossless, it's redundant. It is easier to just > send everything as hex escapes, since no human will see them, only the > FE. This saves the needless conversion (and potential problems with > incorrect notion of the current locale and encoding). Well, using string with just hex escapes is fine for frontend. It might not be as fine to the user. > But some conversions to ``literal characters'' (i.e. to 8-bit binary > codes) are lossy, because the underlying converter needs state > information to correctly interpret the byte stream. This state > information is thrown away once the conversion is done, and so the > opposite conversion fails to reconstruct the original codepoints. > This is usually the case with ISO-2022 encodings. > > So I think on balance it's better to send the original wide characters > as hex, the only downside being that it uses more bytes per character. > (Again, this is about GUI front ends, not about GDB's own CLI output > routines.) Well, I'd prefer to address one problem at a time: 1. Gbd should be modified to print wchar_t* literals. It should use the same logic as for char* to decide if value is representable in the host charset, and use \x escapes otherwise. 2. If you believe that using literals is not suitable for MI, that can be a separate change. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-17 10:35 ` Vladimir Prus @ 2006-04-17 12:26 ` Eli Zaretskii 2006-04-17 13:56 ` Vladimir Prus 0 siblings, 1 reply; 52+ messages in thread From: Eli Zaretskii @ 2006-04-17 12:26 UTC (permalink / raw) To: Vladimir Prus; +Cc: jimb, gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Mon, 17 Apr 2006 13:01:58 +0400 > Cc: jimb@red-bean.com, > gdb@sources.redhat.com > > > What I was saying that indeed this conversion is easy, but it's not > > even close to doing what the front end generally would like to do with > > the string. You want to _process_ the string, which means you want to > > know its length in characters (not bytes), you want to know what > > character set they encode, you want to be able to find the n-th > > character in the string, etc. The encoding suggested by Jim makes > > these tasks very hard, much harder than if we send the string as an > > array of fixed-length wide characters. > > That's a *completely* different topic. Yes, it is. But we must keep it in mind because the front ends want strings to do something with them. > Second, frontend needs to display the data, however it will operate > using its own data structures, and it does not matter if \x escapes > were used or not. No frontend will ever work on a string containing > embedded "\x" escapes. I was saying that the ASCII encoding suggested by Jim makes it harder to convert the text into wide characters, that's all. > > > Using \x escapes, provided they encode *code units*, leaves frontend with > > > the same simple job. > > > > Yes, but GDB will need to generate the code units first, e.g. convert > > fixed-size Unicode wide characters into UTF-8. > > Sorry, where does that UTF-8 comes from? UTF-8 was an example, the general point being that code units are present only in encodings, not in fixed-length wide characters. > > That's extra job for > > GDB. (Again, we were originally talking about wchar_t, not multibyte > > strings.) > > I don't understand what's this extra job. This is as simple as: > > for c in wchar_t* literal: > if c is representable in host encoding: > output_literal > else > output_hex_escape That might sound simple for you, but it isn't, in general. The ``representable in host encoding'' part is very non-trivial; for example, how do you tell whether the Unicode codepoints 0x05C3 and 0x05C4 can be represented in the Windows codepage 1255 (the former can, the latter cannot)? This is generally impossible without using very complicated algorithms and/or large data bases. The other complex part is ``output_literal'': again, there's no simple algorithm to map Unicode's 0x05C3 into cp1255's 0xD3. You need tables again, and you need separate tables for each possible encoding (Hebrew has at least 3 widely used ones, Russian has at least 5, etc.). > > > Really, using strings with \x escapes differs from array > > > printing in just one point: some characters are printed not as hex > > > values, but as characters in local 8-bit encoding. Why do you think this > > > is a problem? > > > > Because knowing what is the ``local 8-bit encoding'' is in itself a > > huge problem. > [...] > I trust you on that, but nothing prevents user/frontend to explicitly specify > the encoding. What makes you think the user and/or front end will know what to specify? Experience shows they generally don't. > > And you certainly do NOT want any local 8-bit encodings when you are > > going to display the string on a GUI, because that would require that > > the front end does some extra job of converting the encoded text back > > to what it needs to communicate with the text widgets. > > I would expect that any GUI toolkit that pretend to support Unicode *has* to > support conversion from local 8 bit encodings. Otherwise, such toolkit is of > no use in real world. Then most of them are ``of no use''. You can rely on most of the modern GUI toolkits to support conversion from UTF-8 to Unicode, but that's about it. For anything more complex, your best bet is to link against libiconv or similar. > By the way, unless your target encoding is ASCII, frontend has to be aware of > local 8 bit encoding anyway. If I wrote program using KOI8-R and frontend > shows the char* (not wchar_t*) strings as ASCII, the frontend is broken > already. This only works as long as you use the encoding that matches your default fonts. Once it doesn't match, or the encoded characters come from a program written for different locale conventions, you are out of luck. It is important to realize that programs don't know anything about characters, all they see is integer code values. To display those codes in human-readable form, a program needs to know what display API to call and which font to request. This kind of information is absent from simple text files that hold encoded non-ASCII text, so programs generally need additional info to DTRT. The same holds for arbitrary strings GDB spills on you from some address in the debuggee. > 1. Gbd should be modified to print wchar_t* literals. ``Print'' is ambiguous in this context. I believe you mean ``send to the front end'', since this was your original problem. If the front end is charged with displaying the wchar_t strings, GDB does not need to print anything by itself. Am I right? > It should use the same > logic as for char* to decide if value is representable in the host charset, I hope I explained above why this part is highly non-trivial. That is why I think GDB should use hex notation for all characters, and leave it for the FE to deal with their display. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-17 12:26 ` Eli Zaretskii @ 2006-04-17 13:56 ` Vladimir Prus 2006-04-18 5:31 ` Eli Zaretskii 0 siblings, 1 reply; 52+ messages in thread From: Vladimir Prus @ 2006-04-17 13:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: jimb, gdb On Monday 17 April 2006 15:21, Eli Zaretskii wrote: > > > What I was saying that indeed this conversion is easy, but it's not > > > even close to doing what the front end generally would like to do with > > > the string. You want to _process_ the string, which means you want to > > > know its length in characters (not bytes), you want to know what > > > character set they encode, you want to be able to find the n-th > > > character in the string, etc. The encoding suggested by Jim makes > > > these tasks very hard, much harder than if we send the string as an > > > array of fixed-length wide characters. > > > > That's a *completely* different topic. > > Yes, it is. But we must keep it in mind because the front ends want > strings to do something with them. Eli, I think we're running in circles. I'd like to reiterate why I ideally want from gdb: 1. For any wchar_t* value, be it value of a variable, or function parameter three levels up the stack, or member of structure, I want gdb to print that value in specific format that's easy for frontend to use. String with escapes is fine. 2. I want that formatting to take effect both for MI commands and for 'print' command, since the user can issue 'print' command manually. 3. I don't mind having this behaviour only when --interpreter=mi is specified. I think that two question we did not agree on are: 1. When talking to FE, should literals be used at all, or string should consist of just \x escapes. 2. When talking to user, should we use string literals, or just \x escapes. I hope you'll agree that using \x escapes when talking to user in not acceptable. And since gdb right now assumes ASCII charset for output, I don't think there will be any problems if ASCII characters are output as-is, without escaping. > > Second, frontend needs to display the data, however it will operate > > using its own data structures, and it does not matter if \x escapes > > were used or not. No frontend will ever work on a string containing > > embedded "\x" escapes. > > I was saying that the ASCII encoding suggested by Jim makes it harder > to convert the text into wide characters, that's all. I don't see why it's so, but nevermind. > > > That's extra job for > > > GDB. (Again, we were originally talking about wchar_t, not multibyte > > > strings.) > > > > I don't understand what's this extra job. This is as simple as: > > > > for c in wchar_t* literal: > > if c is representable in host encoding: > > output_literal > > else > > output_hex_escape > > That might sound simple for you, but it isn't, in general. The > ``representable in host encoding'' part is very non-trivial; for > example, how do you tell whether the Unicode codepoints 0x05C3 and > 0x05C4 can be represented in the Windows codepage 1255 (the former > can, the latter cannot)? This is generally impossible without using > very complicated algorithms and/or large data bases. > > The other complex part is ``output_literal'': again, there's no simple > algorithm to map Unicode's 0x05C3 into cp1255's 0xD3. You need tables > again, and you need separate tables for each possible encoding (Hebrew > has at least 3 widely used ones, Russian has at least 5, etc.). iconv has those tables. You see problems where there are none. > > > > Really, using strings with \x escapes differs from array > > > > printing in just one point: some characters are printed not as hex > > > > values, but as characters in local 8-bit encoding. Why do you think > > > > this is a problem? > > > > > > Because knowing what is the ``local 8-bit encoding'' is in itself a > > > huge problem. > > > > [...] > > I trust you on that, but nothing prevents user/frontend to explicitly > > specify the encoding. > > What makes you think the user and/or front end will know what to > specify? Experience shows they generally don't. First you say it's not possible to detect encoding from environment. Then you say you can't trust user/frontend. Together, that sounds like the problem of making gdb print char* literals reliably is impossible. Is that what you're trying to say? > > 1. Gbd should be modified to print wchar_t* literals. > > ``Print'' is ambiguous in this context. I believe you mean ``send to > the front end'', since this was your original problem. If the front > end is charged with displaying the wchar_t strings, GDB does not need > to print anything by itself. Am I right? > > > It should use the same > > logic as for char* to decide if value is representable in the host > > charset, > > I hope I explained above why this part is highly non-trivial. Using existing logic is in fact absolutely trivial -- that logic already *exists*, you don't need to do anything. > That is > why I think GDB should use hex notation for all characters, and leave > it for the FE to deal with their display. I disagree, for the simple reason that for char* values, existing logic did not cause any problems. Also, while I can take a stab at wchar_t* output, I would not be comfortable with special casing wchar_t* output to frontend. - Volodya ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-17 13:56 ` Vladimir Prus @ 2006-04-18 5:31 ` Eli Zaretskii 0 siblings, 0 replies; 52+ messages in thread From: Eli Zaretskii @ 2006-04-18 5:31 UTC (permalink / raw) To: Vladimir Prus; +Cc: jimb, gdb > From: Vladimir Prus <ghost@cs.msu.su> > Date: Mon, 17 Apr 2006 16:16:26 +0400 > Cc: jimb@red-bean.com, > gdb@sources.redhat.com > > Eli, I think we're running in circles. Fine, then I'll just stop responding. This is my last (and hopefully short) contribution to this thread. > 1. For any wchar_t* value, be it value of a variable, or function > parameter three levels up the stack, or member of structure, I want > gdb to print that value in specific format that's easy for frontend > to use. String with escapes is fine. A noble goal. If you (or someone else) submits patches, I'll be happy to review them. > 2. I want that formatting to take effect both for MI commands and for > 'print' command, since the user can issue 'print' command manually. I think CLI and MI are two different cases, and thus simple solutions that are appropriate for MI (because it doesn't display) will not be good enough for CLI. > 3. I don't mind having this behaviour only when --interpreter=mi is > specified. I don't think `print' should behave differently depending on the interpreter, but whatever. > First you say it's not possible to detect encoding from environment. Then you > say you can't trust user/frontend. Together, that sounds like the problem of > making gdb print char* literals reliably is impossible. Is that what you're > trying to say? I'm trying to say that it would be absurd to add all that complexity to GDB. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: printing wchar_t* 2006-04-14 18:03 ` Jim Blandy 2006-04-14 19:16 ` Eli Zaretskii @ 2006-04-14 19:53 ` Mark Kettenis 1 sibling, 0 replies; 52+ messages in thread From: Mark Kettenis @ 2006-04-14 19:53 UTC (permalink / raw) To: jimb; +Cc: eliz, ghost, gdb > Date: Fri, 14 Apr 2006 10:53:44 -0700 > From: "Jim Blandy" <jimb@red-bean.com> > > I think folks are seeing difficult problems where there aren't any. > Even if the host character set (that is, the character set GDB is > using to communicate with its user, or in its MI communications) is > plain, old ASCII, GDB can, without any loss of information, convey the > contents of a wide string using an arbitrary target character set via > MI to a GUI, using code the GUI must already have. > > Suppose we have a wide string where wchar_t values are Unicode code > points. Suppose our host character set is plain ASCII. Suppose the > user's program has a string containing the digits '123', followed by > some funky Tibetan characters U+0F04 U+0FCC, followed by the letters > 'xyz'. When asked to print that string, GDB should print the > following twenty-one ASCII characters: > > L"123\x0f04\x0fccxyz" > > Since this is a valid way to write that string in a source program, a > user at the GDB command line should understand it. Since consumers of > MI information must contain parsers for C values already, they can > reliably find the contents of the string. I think this makes an awful lot of sense. Mark ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2006-04-17 13:56 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-04-13 17:07 printing wchar_t* Vladimir Prus 2006-04-13 17:25 ` Eli Zaretskii 2006-04-14 7:29 ` Vladimir Prus 2006-04-14 8:47 ` Eli Zaretskii 2006-04-14 12:47 ` Vladimir Prus 2006-04-14 13:05 ` Eli Zaretskii 2006-04-14 13:06 ` Vladimir Prus 2006-04-14 13:15 ` Robert Dewar 2006-04-14 13:17 ` Daniel Jacobowitz 2006-04-14 13:59 ` Robert Dewar 2006-04-14 14:37 ` Eli Zaretskii 2006-04-14 14:08 ` Paul Koning 2006-04-14 14:47 ` Eli Zaretskii 2006-04-14 15:00 ` Vladimir Prus 2006-04-14 17:53 ` Eli Zaretskii 2006-04-17 7:05 ` Vladimir Prus 2006-04-17 8:35 ` Eli Zaretskii 2006-04-13 18:06 ` Jim Blandy 2006-04-13 21:18 ` Eli Zaretskii 2006-04-14 6:02 ` Jim Blandy 2006-04-14 8:43 ` Eli Zaretskii 2006-04-14 7:58 ` Vladimir Prus 2006-04-14 8:07 ` Jim Blandy 2006-04-14 8:30 ` Vladimir Prus 2006-04-14 8:57 ` Eli Zaretskii 2006-04-14 12:52 ` Vladimir Prus 2006-04-14 13:07 ` Daniel Jacobowitz 2006-04-14 14:23 ` Eli Zaretskii 2006-04-14 14:29 ` Daniel Jacobowitz 2006-04-14 14:53 ` Eli Zaretskii 2006-04-14 17:10 ` Daniel Jacobowitz 2006-04-14 17:55 ` Jim Blandy 2006-04-14 18:27 ` Eli Zaretskii 2006-04-14 18:30 ` Jim Blandy 2006-04-14 19:19 ` Eli Zaretskii 2006-04-14 14:16 ` Eli Zaretskii 2006-04-14 14:50 ` Vladimir Prus 2006-04-14 17:18 ` Eli Zaretskii 2006-04-14 18:03 ` Jim Blandy 2006-04-14 19:16 ` Eli Zaretskii 2006-04-14 19:22 ` Jim Blandy 2006-04-14 22:18 ` Daniel Jacobowitz 2006-04-16 11:39 ` Jim Blandy 2006-04-16 15:07 ` Eli Zaretskii 2006-04-15 7:14 ` Eli Zaretskii 2006-04-17 7:16 ` Vladimir Prus 2006-04-17 8:58 ` Eli Zaretskii 2006-04-17 10:35 ` Vladimir Prus 2006-04-17 12:26 ` Eli Zaretskii 2006-04-17 13:56 ` Vladimir Prus 2006-04-18 5:31 ` Eli Zaretskii 2006-04-14 19:53 ` Mark Kettenis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox