Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

llvm-clang; function/variable names containing unicode charactrs

Tags:

unicode

clang

I'm interested in using unicode characters (like \apha) in function/varaible names in my c++ program which I will compile with clang++ on linux. Does anyone know of a good guide / list of rules to go by for making sure that everything ends up compiling fine / avoiding linking errors / ...

Thanks!

like image 381
anon Avatar asked Apr 03 '10 22:04

anon


1 Answers

Clang 3.0 does not support Unicode characters in identifiers. The latest trunk has partial support for this, and I believe someone is currently working to implement this fully.

As for when Clang does support them, take a look at C++11 (n3242) 2.11 [lex.name].

All characters in an identifier must match [a-zA-Z_0-9] or the set of characters in E.1:

00A8, 00AA, 00AD, 00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF
0100-167F, 1681-180D, 180F-1FFF
200B-200D, 202A-202E, 203F-2040, 2054, 2060-206F
2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF
3004-3007, 3021-302F, 3031-303F
3040-D7FF
F900-FD3D, FD40-FDCF, FDF0-FE44, FE47-FFFD
10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD,
  60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD,
  B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFF

The first character must match [a-zA-Z_] or E.1 excluding E.2:

0300-036F, 1DC0-1DFF, 20D0-20FF, FE20-FE2F

As for linking, we need to take a look at the C++ ABI you are using. In this case (Clang and Linux) it would be the Itanium C++ ABI.

And... after searching around forever, the only things I could find were on JNI, and gcc internals. When Clang does implement this, it will use the same mangling as gcc. Either way, as long as all code you compile using unicode identifiers is compiled with the same compiler, it will link correctly.

like image 108
Michael Spencer Avatar answered Nov 17 '22 20:11

Michael Spencer