Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different behaviour between javac 1.6 and javac 1.7 when handling special characters

first of all I would like to thank you and to explicitly say that I've been slamming my head on this issue for several days and looking for a solution in other similar threads with no success.

Our application is responsible of generating java classes and some of them may contain special characters in the class name (thus file name) such as ZoneRéservée435.java forcing the encoding to be UTF-8.

Till Java 1.6 the ant task:

<javac source="1.5" target="1.5" srcdir="${src.dir}" destdir="${classes.dir}" deprecation="on" debug="on" classpathref="classpath" fork="false" memoryMaximumSize="512m" encoding="UTF-8">

worked fine.

When moved to java 1.7 the fileName was not getting saved using the UTF-8 encoding resulting in a file name similar to: ZoneRe?serve?e435.java

Looking around I came to understand that I needed to set the env variable LC_CTYPE to UTF-8. That solved the fileName issue but I still get a compilation error

error: class ZoneRéservée435 is public, should be declared in a file named ZoneRéservée435.java

Although they have the same name, they seem to be encoded in two different ways. The interesting part is that this difference of encoding was happening with java 1.6 but was compiling fine.

Does anyone have any suggestion or ideas?

For what I came to understand the encoding issue is related to the fact that the class is generated with the following:

 Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), Charset.forName("UTF-8")));
  • The code inside the file is using U+00E9 to define the special char;
  • The file name uses eU+0301;

Any suggestion on how to deal with this?

like image 397
MaLLinok Avatar asked Nov 04 '22 10:11

MaLLinok


1 Answers

It seems that your file system uses the decomposed form of the letter é (which is the sequence of the characters e and ´ or \u0065 and \u0301) while your code generator uses the composed form of é (which is \u00e9). This is a typical problem on Apple's HFS+ file system, which always uses the decomposed form.

What you can do to solve this problem is modify your application to decompose the class name that appears in the generated source file with java.text.Normalizer:

Normalizer.normalize(classname, Normalizer.Form.NFD)

See also: http://en.wikipedia.org/wiki/Unicode_equivalence

like image 188
Stefan Ferstl Avatar answered Nov 10 '22 01:11

Stefan Ferstl