Trying to open a file it states it cannot be found, due to a charset mismatch, when file names have accents. I work using UTF-8 on a linux system (/etc/locales sets UTF-8 as well). Running jboss with -Dfile.encoding=UTF-8 and environment variable JBOSS_ENCODING="UTF-8"
With a JSP I am getting the name of the file :
String fileName = element.getChildText("FileName");
out.println("File to be opened : " + filename);
Displays :
File to be opened : aaaaaà.txt
But, a new File(fileName) won't work. Just file.exists() is false.
Trying to:
File[] files = dir.listFiles();
for (int i=0; i<files.length; i++){
out.println(fileName);
I get : aaaaaà .txt
Why is it reading and trying to open the file taking of the file in HDD as ISO-8859-1? Is it a JBoss config? A java config? How can I force java.io.File to read the file using the UTF-8 as the charset of the file name?
I've used other tools and the name is always read fine, using UTF-8.
(note I'm always talking about the name of the file, never the content, it could be a void file)
I am trying to track down the problem. Here is what I already have:
There is Exists.java
:
import java.io.*;
public class Exists {
public static void main(String[] args) {
new File("aaa").exists();
new File("aaa\u00E4").exists();
new File("aaa\u00C3\u00A4").exists();
}
}
And there is java -version
:
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
Now to the interesting part:
$ strace -f -o strace.out java Exists && grep 'stat("aaa' strace.out
31942 stat("aaa", 0x41464950) = -1 ENOENT (No such file or directory)
31942 stat("aaa\303\244", 0x41464950) = -1 ENOENT (No such file or directory)
31942 stat("aaa\303\203\302\244", 0x41464950) = -1 ENOENT (No such file or directory)
The nice thing is that strace
works on byte-level, not character-level like Java. So everything is ok in this case. I have the environment variable LANG
set to en_US.UTF-8
, all of the LC_*
variables are unset.
Now tracking down the problem to a minimal working example:
$ strace -f -o strace.out env - LC_ALL=en_US.UTF-8 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out
31968 stat("aaa", 0x41a75950) = -1 ENOENT (No such file or directory)
31968 stat("aaa\303\244", 0x41a75950) = -1 ENOENT (No such file or directory)
31968 stat("aaa\303\203\302\244", 0x41a75950) = -1 ENOENT (No such file or directory)
That still works. So let's try another encoding:
$ strace -f -o strace.out env - LANG=en_US.ISO-8859-1 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out
32070 stat("aaa", 0x407a3950) = -1 ENOENT (No such file or directory)
32070 stat("aaa?", 0x407a3950) = -1 ENOENT (No such file or directory)
32070 stat("aaa??", 0x407a3950) = -1 ENOENT (No such file or directory)
So this doesn't work. One possible reason might be that I selected a locale that is not in the list printed by locale -a
. But this shouldn't be the reason for Java to convert the letters to question marks.
As soon as LANG points to a non-existing locale, the setting of the sun.jnu.encoding
property doesn't have any effect anymore. So I'm out of ideas now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With