From: 張俊芝 <zjz@zjz.name>
To: gdb-patches@sourceware.org
Subject: support C/C++ identifiers named with non-ASCII characters
Date: Mon, 21 May 2018 09:54:00 -0000 [thread overview]
Message-ID: <9418d4f0-f22a-c587-cc34-2fa67afbd028@zjz.name> (raw)
[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]
Hello, team.
This patch fixes the bug at
https://sourceware.org/bugzilla/show_bug.cgi?id=22973 .
Here is how to test the patch:
Step 1. If you are using Clang or any other C compilers that have
implemented
support for Unicode identifiers, then create a C file with the
following
content:
int main(int åé, char* å[])
{
struct é
{
int æ¸[3];
} é = {100, 200, 300};
int åº = 2;
return 0;
}
Or if you are using GCC, create a C file with the following content as a
workaround(GCC still doesn't actually support Unicode identifiers in
2018, which
is a pity):
int main(int \u53C3\u91CF, char* \u53C3[])
{
struct \u96C6
{
int \u6578[3];
} \u96C6 = {100, 200, 300};
int \u5E8F = 2;
return 0;
}
Step 2. Compile the C file.
Step 3. Run GDB for the compiled executable, add a breakpoint in "return 0".
Step 4. Run until the breakpoint.
Step 5. Test the following commands to see if they work:
p åé
p å
p é
p é.æ¸
p é.æ¸[åº]
Thanks for your review.
[-- Attachment #2: ChangeLog --]
[-- Type: text/plain, Size: 230 bytes --]
2018-05-20 張俊芝 <zjz@zjz.name>
* gdb/c-exp.y (is_identifier_separator): New function.
(lex_one_token): Now recognizes C and C++ Unicode identifiers by using
is_identifier_separator to determine the boundary of a token.
[-- Attachment #3: diff --]
[-- Type: text/plain, Size: 2948 bytes --]
diff --git a/gdb/c-exp.y b/gdb/c-exp.y
index 5e10d2a3b4..b0dd6c7caf 100644
--- a/gdb/c-exp.y
+++ b/gdb/c-exp.y
@@ -73,6 +73,8 @@ void yyerror (const char *);
static int type_aggregate_p (struct type *);
+static bool is_identifier_separator (char);
+
%}
/* Although the yacc "value" of an expression is not used,
@@ -1718,6 +1720,53 @@ type_aggregate_p (struct type *type)
&& TYPE_DECLARED_CLASS (type)));
}
+/* While iterating all the characters in an identifier, an identifier separator
+ is a boundary where we know the iteration is done. */
+
+static bool
+is_identifier_separator (char c)
+{
+ switch (c)
+ {
+ case ' ':
+ case '\t':
+ case '\n':
+ case '\0':
+ case '\'':
+ case '"':
+ case '\\':
+ case '(':
+ case ')':
+ case ',':
+ case '.':
+ case '+':
+ case '-':
+ case '*':
+ case '/':
+ case '|':
+ case '&':
+ case '^':
+ case '~':
+ case '!':
+ case '@':
+ case '[':
+ case ']':
+ /* '<' should not be a token separator, because it can be an open angle
+ bracket followed by a nested template identifier in C++. */
+ case '>':
+ case '?':
+ case ':':
+ case '=':
+ case '{':
+ case '}':
+ case ';':
+ return true;
+ default:
+ break;
+ }
+ return false;
+}
+
/* Validate a parameter typelist. */
static void
@@ -1920,7 +1969,7 @@ parse_number (struct parser_state *par_state,
FIXME: This check is wrong; for example it doesn't find overflow
on 0x123456789 when LONGEST is 32 bits. */
if (c != 'l' && c != 'u' && n != 0)
- {
+ {
if ((unsigned_p && (ULONGEST) prevn >= (ULONGEST) n))
error (_("Numeric constant too large."));
}
@@ -2741,16 +2790,13 @@ lex_one_token (struct parser_state *par_state, bool *is_quoted_name)
}
}
- if (!(c == '_' || c == '$'
- || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')))
+ if (is_identifier_separator(c))
/* We must have come across a bad character (e.g. ';'). */
error (_("Invalid character '%c' in expression."), c);
/* It's a name. See how long it is. */
namelen = 0;
- for (c = tokstart[namelen];
- (c == '_' || c == '$' || (c >= '0' && c <= '9')
- || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '<');)
+ for (c = tokstart[namelen]; !is_identifier_separator(c);)
{
/* Template parameter lists are part of the name.
FIXME: This mishandles `print $a<4&&$a>3'. */
@@ -2932,7 +2978,7 @@ classify_name (struct parser_state *par_state, const struct block *block,
filename. However, if the name was quoted, then it is better
to check for a filename or a block, since this is the only
way the user has of requiring the extension to be used. */
- if ((is_a_field_of_this.type == NULL && !is_after_structop)
+ if ((is_a_field_of_this.type == NULL && !is_after_structop)
|| is_quoted_name)
{
/* See if it's a file name. */
next reply other threads:[~2018-05-21 8:53 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-21 9:54 張俊芝 [this message]
2018-05-21 14:21 ` Simon Marchi
2018-05-21 15:27 ` Paul.Koning
2018-05-21 16:16 ` Eli Zaretskii
2018-05-21 18:34 ` Paul.Koning
[not found] ` <83tvr0ev0p.fsf@gnu.org>
2018-05-21 19:25 ` Paul.Koning
2018-05-21 20:43 ` Joseph Myers
2018-05-22 10:31 ` 張俊芝
2018-05-22 8:34 ` 張俊芝
[not found] ` <1b915196-3e97-4892-7426-be4211fe7889@zjz.name>
2018-05-21 18:00 ` 張俊芝
2018-05-21 18:03 ` 張俊芝
2018-05-21 18:14 ` Matt Rice
2018-05-22 7:06 ` 張俊芝
2018-05-22 14:39 ` Pedro Alves
2018-05-22 14:39 ` 張俊芝
2018-05-22 15:17 ` Pedro Alves
2018-05-22 16:42 ` Pedro Alves
2018-05-22 17:31 ` 張俊芝
2018-05-22 17:38 ` Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9418d4f0-f22a-c587-cc34-2fa67afbd028@zjz.name \
--to=zjz@zjz.name \
--cc=gdb-patches@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox