I'm interested in using unicode characters (like \apha) in function/varaible names in my c++ program which I will compile with clang++ on linux. Does anyone know of a good guide / list of rules to go by for making sure that everything ends up compiling fine / avoiding linking errors / ...
Thanks!
Clang 3.0 does not support Unicode characters in identifiers. The latest trunk has partial support for this, and I believe someone is currently working to implement this fully.
As for when Clang does support them, take a look at C++11 (n3242) 2.11 [lex.name].
All characters in an identifier must match [a-zA-Z_0-9]
or the set of characters in E.1:
00A8, 00AA, 00AD, 00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF
0100-167F, 1681-180D, 180F-1FFF
200B-200D, 202A-202E, 203F-2040, 2054, 2060-206F
2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF
3004-3007, 3021-302F, 3031-303F
3040-D7FF
F900-FD3D, FD40-FDCF, FDF0-FE44, FE47-FFFD
10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD,
60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD,
B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFF
The first character must match [a-zA-Z_]
or E.1 excluding E.2:
0300-036F, 1DC0-1DFF, 20D0-20FF, FE20-FE2F
As for linking, we need to take a look at the C++ ABI you are using. In this case (Clang and Linux) it would be the Itanium C++ ABI.
And... after searching around forever, the only things I could find were on JNI, and gcc internals. When Clang does implement this, it will use the same mangling as gcc. Either way, as long as all code you compile using unicode identifiers is compiled with the same compiler, it will link correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With