Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode/special characters in variable names in clang not allowed?

This question has unicode text that may not display correctly in all browsers.

clang now (>3.3) supports unicode characters in variable names http://llvm.org/releases/3.3/tools/clang/docs/ReleaseNotes.html#major-new-features.

However some special character are still forbiden.

int main(){
    double α = 2.; // alpha, ok!
    double ∞ = 99999.; // infinity, error
}

giving:

error: non-ASCII characters are not allowed outside of literals and identifiers
        double ∞ = 99999.;

What is the fundamental difference between α (alpha) and (infinty) for clang? That the former is unicode and the latter is not unicode but at the same time is not ASCII?

Is there a workaround or an option to allow this set of characters in clang (or BTW in gcc)?

Notes: 1) is just an example, there are a lot of characters that are potentially useful but also forbidden, like or . 2) I am not asking if it is good idea, please take it as a technical question. 3) I am interested in C++ compiler of clang 3.4 in Linux (gcc 4.8.3 doesn't support this). I am saving the source files with gedit using UTF-8 encoding and Unix/Linux line ending. 4) adding other normal first characters doesn't help: _∞


The answers point to a definite NO. Some ranges are indeed not allowed nor will they be soon. To move one step further to total craziness, the best alternative I found was to use characters that effectively look the same. (Now, this I might admit is not a good idea.) Those alternatives can be found here http://shapecatcher.com/. The result (sorry if it hurts your eyes):

//    double ∞ = 99999.; // still error
//    double ⧞ = 99999.; // infinity negated still error
  double ꝏ = 99999.; // letter oo
  double Ꝏ = 99999.; // letter OO
//    double ⧜ = 99999.; // incomplete infinity still error

Other "alternative" dead ringers mentioned in the question that are in the allowed range: ʃ, 𝜕𝝏𝞉𝟃.

like image 810
alfC Avatar asked Oct 30 '14 18:10

alfC


1 Answers

So the clang document says (emphasis mine):

This feature allows identifiers to contain certain Unicode characters, as specified by the active language standard;

This is covered in the draft C++ standard Annex E, the characters allowed are as follows:

E.1 Ranges of characters allowed [charname.allowed]

00A8, 00AA, 00AD,

00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF

0100-167F, 1681-180D, 180F-1FFF 200B-200D, 202A-202E, 203F-2040, 2054,

2060-206F 2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF

3004-3007, 3021-302F, 3031-303F

3040-D7FF F900-FD3D, FD40-FDCF,

FDF0-FE44, FE47-FFFD

10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD

The code for infinity 221E is not included in the list.

For reference: these are the codes above converted to unicode characters (some of them may not display correctly in all browsers/available fonts).

¨, ª, ­,

¯, ²-µ, ·-º, ¼-¾, À-Ö, Ø-ö, ø-ÿ

Ā-ᙿ, ᚁ-᠍, ᠏-῿ ​-‍, ‪-‮, ‿-⁀, ⁔,

⁠- ⁰-↏, ①-⓿, ❶-➓, Ⰰ-ⷿ, ⺀-⿿

〄-〇, 〡-〯, 〱-〿

぀-퟿ 豈-ﴽ, ﵀-﷏,

ﷰ-﹄, ﹇-�

𐀀-🿽, 𠀀-𯿽, 𰀀-𿿽, 񀀀-񏿽, 񐀀-񟿽, 񠀀-񯿽, 񰀀-񿿽, 򀀀-򏿽, 򐀀-򟿽, 򠀀-򯿽, 򰀀-򿿽, 󀀀-󏿽, 󐀀-󟿽, 󠀀-󯿽

I could not find an extensive document that covers the rationale for the ranges chosen although N3146: Recommendations for extended identifier characters for C and C++ does provides some details on the influences.

like image 172
Shafik Yaghmour Avatar answered Nov 02 '22 22:11

Shafik Yaghmour