Hi, This patch contains (at least the start of) support for printing wchar_t strings from a debugged program within GDB. This is the subject for GDB bugs 9103 (and its duplicates 9369, 9268) and maybe 7821. Notes on the implementation: 1. I've added a new configuration variable, similar to "host-charset" and "target-charset". The latter can't be used for printing wide characters, because regular C strings and wide strings aren't necessarily (or in fact ever) encoded using the same encoding. The new variable is set like: (gdb) set target-wide-charset UTF-32 I considered adding "set target-wide-charset auto" to attempt to auto-detect the charset used for wchar_t strings automatically (i.e. probably 4 bytes -> UCS-4, 2 bytes -> UTF-16), but that's not done presently. 2. The host terminal may be able to print Unicode characters, by feeding it UTF-8 encoded characters. There are some limitations: I don't think Unix terminals support combining character sequences -- I've ignored that for now. GDB currently defaults "host-charset" to ISO-8859-1, although a given terminal may not print top-bit-set characters correctly. I've added a new way of setting the host character set from the host terminal (using nl_langinfo (CODESET)), like so: (gdb) set host-charset auto If the terminal supports UTF-8 (e.g. LC_ALL is set to en_US.UTF-8), we will then see: (gdb) show host-charset The host character set is "UTF-8" (auto). If the terminal only supports ASCII (e.g. LC_ALL is set to C), we will instead see: (gdb) show host-charset The host character set is "ANSI_X3.4-1968" (auto). 3. Types which are literally called "wchar_t" are assumed to be wide characters. So we can do: wchar_t *msg = L"Hello world"; and then: (gdb) p msg $1 = (wchar_t *) 0x85c4 "Hello world" If the message contains funny characters, and the user has typed "set host-charset auto" on a UTF-8 capable terminal, they will be printed nicely: (gdb) p msg $2 = (wchar_t *) 0x85c4 "Schöne Grüße" With the caveat that there's no way for GDB to know if you have a font with the right glyphs in it: if not, you can fall back to ASCII: (gdb) set host-charset ASCII (gdb) p msg $3 = (wchar_t *) 0x85c4 "Sch\x00f6ne Gr\x00fc\x00dfe" 4. If you want to print an integer array type which isn't literally called "wchar_t" but nevertheless contains a wchar_t string, you can override using "/s", just like with regular strings, e.g.: (gdb) p/s intmsg $2 = (int *) 0x85c4 "Schöne Grüße" 5. The existing string-printing code is careful about not printing out lots of repeating characters. For wchar_t strings (taking into account the differences between what they represent on various platforms mentioned above), there is generally an X-Y correspondence between the number of input bytes and the number of output bytes for each character: to detect repeats, we convert an arbitrary number of X's to UCS-4, detect repeated UCS-4 values, then translate each to Y output characters. Current shortcomings: 1. There's no support for non-C-like languages. 2. I've probably broken building with iconv disabled (actually I couldn't figure out how to build without iconv() support -- even for e.g. a mingw32 host which shouldn't support it). 3. Currently wrong-endian wide characters from the target will confuse things (but you can explicitly set target-wide-charset to UCS-4LE or UCS-4BE for example). 4. I've not written documentation or altered test cases yet (charset.exp shows some regressions). Tom Tromey is working on a patch related to this. Some of his comments are incorporated in this patch relative to an earlier version sent to him privately (thanks!). Regression tested on x86-64 Linux, and spot-checked with an ARM Linux cross debugger (from x86 build/host). As mentioned above, there are some regressions so far. OK to apply, or any comments? Cheers, Julian ChangeLog gdb/ * c-valprint.c (textual_element_type): Alter TYPE to be the type of the element before looking through typedefs, and update comment. Add wide-character support. (c_val_print): Pass type before typedef resolution to textual_element_type calls. * charset.c (langinfo.h): Include, if HAVE_LANGINFO_CODESET. (GDB_DEFAULT_TARGET_WIDE_CHARSET, GDB_INTERNAL_CODESET): New macros. (host_charset_auto): New. (show_host_charset_name): Indicate automatically-selected charset. (target_wide_charset_name, show_target_wide_charset_name): New. (host_charset_enum): Add "auto". (target_wide_charset_enum): New. Support a limited number of wchar_t character sets. (iconv_char_print_literally): New. (iconv_to_control): New. (lookup_and_register_iconv_charset): New. (default_c_internal_char_has_backslash_escape): New. (current_target_wide_charset, internal_charset): New. (set_host_charset): Add support for "auto" host charset. (show_charset): Show target wide charset. (set_target_wide_charset, set_target_wide_charset_sfunc) (target_wide_charset, cached_iconv_target_to_internal) (cached_iconv_internal_to_host, target_to_internal_iconv_t) (internal_to_host_iconv_t, reset_host_char_state) (target_char_to_internal, internal_char_host_emit): New. (_initialize_charset): Add wide-character support. * charset.h (target_wide_charset, reset_host_char_state) (target_char_to_internal) (internal_char_host_emit): Add prototypes. * c-lang.c (c_internal_char_host_emit, c_printwidestr): New. (c_printstr): Call c_printwidestr when appropriate. * printcmd.c (print_formatted): Add wide-character support. * configure.ac (AM_LANGINFO_CODESET): Add. * acinclude.m4 (../config/codeset.m4): Include. * config.in: Regenerate. * configure: Regenerate.