Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode identifiers (function names) for non-localization purposes advisable?

PHP allows Unicode identifiers for variables, functions, classes and constants anyhow. It was certainly intended for localized applications. Wether it's a good idea to code an API in anything but English is debatable, but it's undisputed that some development settings could demand it.

 $Schüssel = new Müsli(T_FRÜCHTE);

But PHP allows more than just \p{L} for identifiers. You can use virtually any Unicode character, except those from the ASCII range (e.g. : is special or \ as that's already used as internal hack to support namespaces.)
Anyway, you could do so, and I would even consider that a workable use for fun projects:

 throw new ಠ_ಠ("told you about the disk space before");

But other than localization and amusement and decorative effects, which uses of Unicode identifiers are advisable?

For example I'm pondering this for embedding parameters into magic method names. In my case I only need to inject numeric parameters, so would get away with just the underscore:

 $what->substr_0_50->ascii("text");
  // (Let's skip the evilness discussion this time. Not quite sure
  // yet if I really want it, but the conciseness might make sense.)

But if I wanted to embed other textual parameters, I would require another unicode character. Now that's harder to type, but if there's one that would aid readability and convey the meaning ... ?

 ->substr✉0✉50->   // doesn't look good

So, the question in this case: Which symbol makes sense as separator for mixed-in parameters in a virtual function name. -- Broader meta topic: Which uses of Unicode identifiers do you know about, or would you consider okayish?

like image 602
mario Avatar asked Mar 18 '11 23:03

mario


People also ask

What is Unicode used for?

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

What is a Unicode character example?

Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the number in hex; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F (see hex chart).

How many symbols are there in the Unicode?

Q: How many characters are in Unicode? The short answer is that as of Version 15.0, the Unicode Standard contains 149,186 characters.

How Unicode is able to represent many more symbols than can ASCII?

ASCII cannot be used to encode the many types of characters found around the world. Unicode was extended further to UTF-16 and UTF-32 to encode the various types of characters. Therefore, the significant difference between ASCII and Unicode is the number of bits used to encode.


2 Answers

Which symbol makes sense as separator for mixed-in parameters in a virtual function name.

\u2639?

But other than localization and amusement and decorative effects, which uses of Unicode identifiers are advisable?

The biggest hurdle after font support is going to be making the character one that can be typed. Outside of a macro or copy/paste, unicode characters are not spectacularly easy to enter. Forcing this upon others is very likely going to violate the "assume the people that work with your code after you are murderous psychopaths that know where you live" rule.

We use unicode characters in only a few comments in our codebase, like

// Even though this is the end of the file and we should get an implicit exit, 
// if we don't actually expressly exit here, PHP segfaults.
// ♫ Oh, PHP, I love you. ♫

I think that falls into the "amusement and decorative" category. Or the "shoot self in head after slaughtering the php-internals team" category. Pick one.

Anyway, this is not a good idea because it's going to make your code hard to modify.

like image 196
Charles Avatar answered Oct 16 '22 09:10

Charles


Just to make it clear: PHP does not support Unicode. And it doesn't support Unicode labels. To be more precise PHP defines a LABEL as [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*. As you can see here, it allows only a small range of characters apart from the typical alphanumeric + underscore. The fact that your Unicode labels are still accepted is only an artifact from the fact, that PHP doesn't have Unicode support. Your special characters are several bytes long in UTF-8 and PHP treats each of these bytes as a separate character and accidentally - with the characters you tried - each of them matched with the \x7f-\xff range mentioned above.

Further reading on that topic: Exotic names for methods, constants, variables and fields - Bug or Feature?

like image 27
NikiC Avatar answered Oct 16 '22 11:10

NikiC