Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it wrong to use special characters in C# source code, such as "ñ"?

Recently, using C#, I just declared a method parameters using the Latin character ñ, and I tried to build (compile) my entire solution and it works, therefore I was able to execute my program. But I'm curious to know if it is wrong to use special characters such as Latin characters in a source code written in C#? If it is wrong, why?

Besides it is more legible and universal to write code in English, are there any other reason to not use special characters in a C# source code?

like image 303
Rubens Mariuzzo Avatar asked Jan 12 '12 15:01

Rubens Mariuzzo


2 Answers

Let me break this down into several questions.

Is it legal according to the specification to use non-Roman letters in C# identifiers, strings, and so on?

Yes, absolutely. Any character that the Unicode specification classifies as a letter is legal. See the specification for the exact details.

Are there any technical issues regarding non-Roman letters in C# programs?

Yes, there are a few. As you are probably aware, you can both "statically" and "dynamically" link code into an application, and the compiler is an application. We've had problems in the past where the compiler had a statically-linked-in old version of the Unicode classification algorithm, and the editor had a dyamically-linked-in current version, and now the editor and the compiler can disagree on what is a legal letter, which can cause user confusion. However, the accented Latin characters you mention have been in the Unicode standard so long that they are unlikely to cause any sort of problem.

Moreover, a lot of people still use old-fashioned editors; I learned how to program at WATCOM back in the late 1980's and I still frequently use WATCOM VI as my editor. I can sometimes code faster in it than I can in Visual Studio because my fingers are just really good at it after 23 years of practice. (Though these days I use Visual Studio for almost everything.) Obviously an editor written in the 1980's is going to have a problem with Unicode.

Are there any non-technical issues regarding non-Roman letters in C# programs?

Obviously, yes. I personally would rather use Greek letters for generic type parameters, for instance:

class List<τ> : IEnumerable<τ> 

or when implementing mathematical code:

degrees = 180.0 * radians / π; 

But I resist the urge in deference to my coworkers who do not particularly want to be cutting and pasting, or learning arcane key combinations, just to edit my code.

like image 184
Eric Lippert Avatar answered Sep 18 '22 20:09

Eric Lippert


Added this first bit based on the comment:

This doesn't answer the question... The OP isn't asking whether it is allowed (obviously it is), but whether it's wrong – Thomas Levesque

Ok, let me address it more directly:

it is wrong to use special characters such as Latin characters in a source code written in C#? If it is wrong, why?

By definition of the specification, it is not "wrong" (see below).

Besides it is more legible and universal to write code in English, are there any other reason to not use special characters in a C# source code?

Since you said "Besides", I'm not going to address the legibility nor "universality" topics (as is appropriate for a StackOverflow question anyways). To your other part: "are there any other reason to not use special characters"... Since I'm ignoring the first things you mentioned, I have to say I can't think of many. The only thing I can think of is; We still (amazingly) have problems with some tools supporting Unicode today (off-brand third party tools, mostly) it MAY be that you use some wacky tool which doesn't handle unicode correctly, or doesn't conform to the C# spec correctly - but I haven't come across any. So, I'd say no. (Keeping in mind you specifically said I didn't have to address to legibility or universality topics).


From the C# ECMA Specification Page 70:

The rules for identifiers given in this subclause correspond exactly to those recommended by the Unicode Standard Annex 15 except that underscore is allowed as an initial character (as is traditional in the C programming language), Unicode escape sequences are permitted in identifiers, and the “@” character is allowed as a prefix to enable keywords to be used as identifiers.

identifier::      available-identifier     @ identifier-or-keyword  available-identifier::     An identifier-or-keyword that is not a keyword  identifier-or-keyword::     identifier-start-character      identifier-part-charactersopt  identifier-start-character::      letter-character     _ (the underscore character U+005F)  identifier-part-characters::     identifier-part-character     identifier-part-characters     identifier-part-character  identifier-part-character::      letter-character     decimal-digit-character      connecting-character      combining-character      formatting-character  letter-character::     A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl     A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl 

The important bit there is what the spec defined a letter-character as.

It specifically includes: A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl

The character you mention (ñ unicode reference) belongs to the category "Lu" (Letter, Uppercase) which is specifically allowed by the specification in an identifier.

like image 23
Steve Avatar answered Sep 21 '22 20:09

Steve