Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the experiences with using unicode in identifiers

These days, more languages are using unicode, which is a good thing. But it also presents a danger. In the past there where troubles distinguising between 1 and l and 0 and O. But now we have a complete new range of similar characters.

For example:

ì, î, ï, ı, ι, ί, ׀ ,أ ,آ, ỉ, ﺃ

With these, it is not that difficult to create some very hard to find bugs.

At my work, we have decided to stay with the ANSI characters for identifiers. Is there anybody out there using unicode identifiers and what are the experiences?

like image 276
Toon Krijthe Avatar asked Nov 16 '08 20:11

Toon Krijthe


People also ask

When was the Unicode character set introduced?

The first version of Unicode was introduced in 1991. Unicode character set was designed to include all the characters available in all the languages/scripts of the world.

How does the Java compiler work on Unicode characters?

The Java compiler works on Unicode characters. Our Java source file is normally encoded in ASCII or some extension of ASCII. While decoding from ASCII to Unicode, the compiler would first replace the Unicode escapes in the Java file with the actual Unicode character value.

Can Unicode characters be added or removed?

In the Unicode character set, there is no provision for removing or updating any character, so newer versions of Unicode can only add new characters and it may deprecate any existing characters. The blocks for the South Central and South East Asian Scripts in Unicode are summarized in Tables 3 to 7. What is the size of char in C?

What is the range of hex code points for Unicode characters?

In Unicode standard, the range of code-point values from D800 to DFFF (Hex) has not been assigned to any valid character and is reserved for surrogates. For characters in the range of 0000 —FFFF (Hex), the values of code-points and UTF-16 code units are the same.


1 Answers

Besides the similar character bugs you mention and the technical issues that might arise when using different editors (w/BOM, wo/BOM, different encodings in the same file by copy pasting which is only a problem when there are actually characters that cannot be encoded in ASCII and so on), I find that it's not worth using Unicode characters in identifiers. English has become the lingua franca of development and you should stick to it while writing code.

This I find particularly true for code that may be seen anywhere in the world by any developer (open source, or code that is sold along with the product).

like image 58
Vinko Vrsalovic Avatar answered Oct 24 '22 20:10

Vinko Vrsalovic