Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to make g++ compile this program with Unicode identifiers? [duplicate]

I am trying to use unicode variable names in g++.

It does not appear to work.

Does g++ not support unicode variable names, ... or is there some subset of unicode (from which I'm not testing in).

Thanks!

like image 260
anon Avatar asked Apr 21 '10 09:04

anon


People also ask

Which programming language uses Unicode?

Unicode is the name of the national languages character coding (Russian, Turkish, Chinese ...) into the computer binary form. All characters used to be stored in one byte originally, it means that it was possible to code a total of 256 different characters.

What is an example of Unicode?

Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the number in hex; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F (see hex chart).

How is Unicode encoded?

Unicode uses two encoding forms: 8-bit and 16-bit, based on the data type of the data that is being that is being encoded. The default encoding form is 16-bit, where each character is 16 bits (2 bytes) wide. Sixteen-bit encoding form is usually shown as U+hhhh, where hhhh is the hexadecimal code point of the character.


2 Answers

You have to specify the -fextended-identifiers flag when compiling, you also have to use \uXXXX or \uXXXXXXXX for unicode(atleast in gcc it's unicode)

Identifiers (variable/class names etc) in g++ can't be of utf-8/utf-16 or whatever encoding, they have to be:

identifier:
  nondigit
  identifier nondigit
  identifier digit

a nondigit is

nondigit: one of
  universalcharactername
  _ a b c d e f g h i j k l m n o p q r s t u v w x y z
  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

and a universalcharactername is

universalcharactername:
  \UXXXXXXXX
  \uXXXX

Thus, if you save your source file as UTF-8, you cannot have a variable like e.g.:

int høyde = 10;

it had to be written like:

int h\u00F8yde = 10;

(which imo would beat the whole purpose - so just stick with a-z)

like image 180
nos Avatar answered Oct 10 '22 10:10

nos


A one-line patch to the cpp preprocessor allows UTF-8 input. Details for gcc are given at

https://www.raspberrypi.org/forums/viewtopic.php?p=802657

however, since the preprocessor is shared, the same patch should work for g++ as well. In particular, the patch needed, as of gcc-5.2 is

diff -cNr gcc-5.2.0/libcpp/charset.c gcc-5.2.0-ejo/libcpp/charset.c
*** gcc-5.2.0/libcpp/charset.c  Mon Jan  5 04:33:28 2015
--- gcc-5.2.0-ejo/libcpp/charset.c  Wed Aug 12 14:34:23 2015
***************
*** 1711,1717 ****
    struct _cpp_strbuf to;
    unsigned char *buffer;

!   input_cset = init_iconv_desc (pfile, SOURCE_CHARSET, input_charset);
    if (input_cset.func == convert_no_conversion)
      {
        to.text = input;
--- 1711,1717 ----
    struct _cpp_strbuf to;
    unsigned char *buffer;

!   input_cset = init_iconv_desc (pfile, "C99", input_charset);
    if (input_cset.func == convert_no_conversion)
      {
        to.text = input;

Note that for the above patch to work, a recent version of iconv needs to be installed that supports C99 conversions. Type iconv --list to verify this, otherwise, you can install a new version of iconv along with gcc as described in the link above. Change the configure command to

$ ../gcc-5.2.0/configure -v --disable-multilib \
    --with-libiconv-prefix=/usr/local/gcc-5.2 \
    --prefix=/usr/local/gcc-5.2 \
    --enable-languages="c,c++"

if you are building for x86 and want to include the c++ compiler as well.

like image 45
ejolson Avatar answered Oct 10 '22 10:10

ejolson