This question has unicode text that may not display correctly in all browsers.
clang
now (>3.3) supports unicode characters in variable names http://llvm.org/releases/3.3/tools/clang/docs/ReleaseNotes.html#major-new-features.
However some special character are still forbiden.
int main(){
double α = 2.; // alpha, ok!
double ∞ = 99999.; // infinity, error
}
giving:
error: non-ASCII characters are not allowed outside of literals and identifiers
double ∞ = 99999.;
What is the fundamental difference between α
(alpha) and ∞
(infinty) for clang
? That the former is unicode and the latter is not unicode but at the same time is not ASCII?
Is there a workaround or an option to allow this set of characters in clang
(or BTW in gcc
)?
Notes: 1) ∞
is just an example, there are a lot of characters that are potentially useful but also forbidden, like ∫
or ∂
. 2) I am not asking if it is good idea, please take it as a technical question. 3) I am interested in C++ compiler of clang 3.4
in Linux (gcc 4.8.3
doesn't support this). I am saving the source files with gedit
using UTF-8
encoding and Unix/Linux
line ending. 4) adding other normal first characters doesn't help: _∞
The answers point to a definite NO. Some ranges are indeed not allowed nor will they be soon. To move one step further to total craziness, the best alternative I found was to use characters that effectively look the same. (Now, this I might admit is not a good idea.) Those alternatives can be found here http://shapecatcher.com/. The result (sorry if it hurts your eyes):
// double ∞ = 99999.; // still error // double ⧞ = 99999.; // infinity negated still error double ꝏ = 99999.; // letter oo double Ꝏ = 99999.; // letter OO // double ⧜ = 99999.; // incomplete infinity still error
Other "alternative" dead ringers mentioned in the question that are in the allowed range:
ʃ
,𝜕𝝏𝞉𝟃
.
So the clang
document says (emphasis mine):
This feature allows identifiers to contain certain Unicode characters, as specified by the active language standard;
This is covered in the draft C++ standard Annex E, the characters allowed are as follows:
E.1 Ranges of characters allowed [charname.allowed]
00A8, 00AA, 00AD,
00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF
0100-167F, 1681-180D, 180F-1FFF 200B-200D, 202A-202E, 203F-2040, 2054,
2060-206F 2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF
3004-3007, 3021-302F, 3031-303F
3040-D7FF F900-FD3D, FD40-FDCF,
FDF0-FE44, FE47-FFFD
10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD
The code for infinity 221E
is not included in the list.
For reference: these are the codes above converted to unicode characters (some of them may not display correctly in all browsers/available fonts).
¨, ª, ,
¯, ²-µ, ·-º, ¼-¾, À-Ö, Ø-ö, ø-ÿ
Ā-ᙿ, ᚁ-᠍, ᠏- -, -, ‿-⁀, ⁔,
- ⁰-, ①-⓿, ❶-➓, Ⰰ-ⷿ, ⺀-
〄-〇, 〡-〯, 〱-〿
- 豈-ﴽ, ﵀-﷏,
ﷰ-﹄, ﹇-�
𐀀-, 𠀀-, 𰀀-, -, -, -, -, -, -, -, -, -, -, -
I could not find an extensive document that covers the rationale for the ranges chosen although N3146: Recommendations for extended identifier characters for C and C++ does provides some details on the influences.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With