I am 😞 to find that I cannot use 😃 as a valid identifier with g++ 4.7, even with the <code>-fextended-identifiers</code> option enabled: <pre class="prettyprint"><code>int main(int argc, const char* argv[]) { const char* 😃 = "I'm very happy"; return 0; } </code></pre> <blockquote> main.cpp:3:3: error: stray ‘\360’ in program main.cpp:3:3: error: stray ‘\237’ in program main.cpp:3:3: error: stray ‘\230’ in program main.cpp:3:3: error: stray ‘\203’ in program </blockquote> After some googling, I discovered that UTF-8 characters are not yet supported in identifiers, but a universal-character-name should work. So I convert my source to: <pre class="prettyprint"><code>int main(int argc, const char* argv[]) { const char* \U0001F603 = "I'm very happy"; return 0; } </code></pre> <blockquote> main.cpp:3:15: error: universal character \U0001F603 is not valid in an identifier </blockquote> So apparently 😃 isn't a valid identifier character. However, the standard specifically allows characters from the range <code>10000-1FFFD</code> in Annex E.1 and doesn't disallow it as an initial character in E.2. My next effort was to see if any other allowed Unicode characters worked - but none that I tried did. Not even the ever important PILE OF POO (💩) character. So, for the sake of meaningful and descriptive variable names, what gives? Does <code>-fextended-identifiers</code> do as it advertises or not? Is it only supported in the very latest build? And what kind of support do other compilers have?

As of 4.8, gcc does not support characters outside of the BMP used as identifiers. It seems to be an unnecessary restriction. Also, gcc only supports a very restricted set of character described in ucnid.tab, based on C99 and C++98 (it is not updated to C11 and C++11 yet, it seems). As described in the manual, <code>-fextended-identifiers</code> is experimental, so it has a higher chance won't work as expected. <hr> Edit: GCC supported the C11 character set starting from 4.9.0 (svn r204886 to be precise). So OP's second piece of code using <code>\U0001F603</code> does work. I still can't get the actual code using <code>😃</code> to work even with <code>-finput-charset=UTF-8</code> with GCC 8.2 on https://gcc.godbolt.org though (You may want to follow this bug report, provided by @DanielWolf). Meanwhile both pieces of code work on clang 3.3 without any options other than <code>-std=c++11</code>.

😃 (and other Unicode characters) in identifiers not allowed by g++

Tags:

c++

gcc

c++11

unicode

g++

I am 😞 to find that I cannot use 😃 as a valid identifier with g++ 4.7, even with the -fextended-identifiers option enabled:

int main(int argc, const char* argv[]) {   const char* 😃 = "I'm very happy";   return 0; }

main.cpp:3:3: error: stray ‘\360’ in program
main.cpp:3:3: error: stray ‘\237’ in program
main.cpp:3:3: error: stray ‘\230’ in program
main.cpp:3:3: error: stray ‘\203’ in program

After some googling, I discovered that UTF-8 characters are not yet supported in identifiers, but a universal-character-name should work. So I convert my source to:

int main(int argc, const char* argv[]) {   const char* \U0001F603 = "I'm very happy";   return 0; }

main.cpp:3:15: error: universal character \U0001F603 is not valid in an identifier

So apparently 😃 isn't a valid identifier character. However, the standard specifically allows characters from the range 10000-1FFFD in Annex E.1 and doesn't disallow it as an initial character in E.2.

My next effort was to see if any other allowed Unicode characters worked - but none that I tried did. Not even the ever important PILE OF POO (💩) character.

So, for the sake of meaningful and descriptive variable names, what gives? Does -fextended-identifiers do as it advertises or not? Is it only supported in the very latest build? And what kind of support do other compilers have?

566

asked Oct 02 '12 14:10

Joseph Mansfield

2 Answers

As of 4.8, gcc does not support characters outside of the BMP used as identifiers. It seems to be an unnecessary restriction. Also, gcc only supports a very restricted set of character described in ucnid.tab, based on C99 and C++98 (it is not updated to C11 and C++11 yet, it seems).

As described in the manual, -fextended-identifiers is experimental, so it has a higher chance won't work as expected.

Edit:

GCC supported the C11 character set starting from 4.9.0 (svn r204886 to be precise). So OP's second piece of code using \U0001F603 does work. I still can't get the actual code using 😃 to work even with -finput-charset=UTF-8 with GCC 8.2 on https://gcc.godbolt.org though (You may want to follow this bug report, provided by @DanielWolf).

Meanwhile both pieces of code work on clang 3.3 without any options other than -std=c++11.

123

answered Nov 12 '22 02:11

kennytm

This was a known bug in GCC 9 and before. This has been fixed in GCC 10.

The official changelog for GCC 10 contains this section:

Extended characters in identifiers may now be specified directly in the input encoding (UTF-8, by default), in addition to the UCN syntax (\uNNNN or \UNNNNNNNN) that is already supported:

static const int π = 3; int get_naïve_pi() {   return π; }

answered Nov 12 '22 03:11

Daniel Wolf

Related questions
                            
                                identifier "string" undefined?
                            
                                Is list::size() really O(n)?
                            
                                Difference between string.empty and string[0] == '\0'
                            
                                Delete all items from a c++ std::vector
                            
                                linux/videodev.h : no such file or directory - OpenCV on ubuntu 11.04
                            
                                Why std::cout instead of simply cout?
                            
                                non-member function cannot have cv-qualifier
                            
                                malloc() vs. HeapAlloc()
                            
                                Metaprogramming in C++ and in D
                            
                                Rotate an image without cropping in OpenCV in C++
                            
                                java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader
                            
                                Iterator to last element in std::list
                            
                                Easy way to remove extension from a filename?
                            
                                Where is the 'this' pointer stored in computer memory?
                            
                                How do I convert a long to a string in C++?
                            
                                Error LNK2019 unresolved external symbol _main referenced in function "int __cdecl invoke_main(void)" (?invoke_main@@YAHXZ)
                            
                                Functional programming in C++. Implementing f(a)(b)(c)
                            
                                Why Switch/Case and not If/Else If?
                            
                                Integrate type name in static_assert output?
                            
                                Why does unique_ptr have the deleter as a type parameter while shared_ptr doesn't?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With