Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding of file names in Java

I am running a small Java application on an embedded Linux platform. After replacing the Java VM JamVM with OpenJDK, file names with special characters are not stored correctly. Special characters like umlauts are replaced by question marks.

Here is my test code:

import java.io.File;
import java.io.IOException;

public class FilenameEncoding
{

        public static void main (String[] args) {
                String name = "umlaute-äöü";
                System.out.println("\nname = " + name);
                System.out.print("name in Bytes: ");
                for (byte b : name.getBytes()) {
                        System.out.print(Integer.toHexString(b & 255) + " ");
                }
                System.out.println();

                try {
                        File f = new File(name);
                        f.createNewFile();
                } catch (IOException e) {
                        e.printStackTrace();
                }
        }

}

Running it gives the following output:

name = umlaute-???
name in Bytes: 75 6d 6c 61 75 74 65 2d 3f 3f 3f

and file called umlaute-??? is created.

Setting the properties file.encoding and sun.jnu.encoding to UTF-8 gives the correct strings in the terminal, but the created file is still umlaute-???

Running the VM with strace, I can see the system call

open("umlaute-???", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0666) = 4

This shows, that the problem is not a file system issue, but one of the VM.

How can the encoding of the file name be set?

like image 739
Roland Brand Avatar asked Apr 11 '12 12:04

Roland Brand


People also ask

What is Java file encoding?

encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.

How do I change the encoding of a file in Java?

setProperty("file. encoding", "UTF-8"); byte inbytes[] = new byte[1024]; FileInputStream fis = new FileInputStream("response. txt"); fis. read(inbytes); FileOutputStream fos = new FileOutputStream("response-2.

What text encoding does Java use?

The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.

What is the name of file in Java?

In Java, the java file name should be always the same as a public class name. While writing a java program first it is saved as a ". java" file, when it is compiled it forms byte code which is a ".


2 Answers

If you are using Eclipse, then you can go to Window->Preferences->General->Workspace and select the "Text file encoding" option you want from the pull down menu. By changing mine around, I was able to recreate your problem (and also change back to the fix).

If you are not, then you can add an environmental variable to windows (System properties->Environment Variables and under system variables you want to select New...) The name should be (without quotes) JAVA_TOOL_OPTIONS and the value should be set to -Dfile.encoding=UTF8 (or whatever encoding will get yours to work.

I found the answer through this post, btw: Setting the default Java character encoding?

Linux Solutions

-(Permanent) Using env | grep LANG in the terminal will give you one or two responses back on what encoding linux is currently setup with. You can then set LANG to UTF8 (yours might be set to ASCII) in the /etc/sysconfig i18n file (I tested this on 2.6.40 fedora). Bascially, I switched from UTF8 (where I had odd characters) to ASCII (where I had question marks) and back.

-(on running the JVM, but may not fix the problem) You can start the JVM with the encoding you want using java -Dfile.encoding=**** FilenameEncoding Here is the output from the two ways:

[youssef@JoeLaptop bin]$ java -Dfile.encoding=UTF8 FilenameEncoding

name = umlaute-הצ�
name in Bytes: 75 6d 6c 61 75 74 65 2d d7 94 d7 a6 ef bf bd 
UTF-8
UTF8

[youssef@JoeLaptop bin]$ java FilenameEncoding

name = umlaute-???????
name in Bytes: 75 6d 6c 61 75 74 65 2d 3f 3f 3f 3f 3f 3f 3f 
US-ASCII
ASCII

Here is some references for the linux stuff http://www.cyberciti.biz/faq/set-environment-variable-linux/

and here is one about the -Dfile.encoding Setting the default Java character encoding?

like image 173
Youssef G. Avatar answered Oct 20 '22 04:10

Youssef G.


I know it's an old question but I had the same problem. All of the mentioned solutions did not work for me, but the following solved it:

  • Source encoding to UTF8 (project.build.sourceEncoding to UTF-8 in maven properties)
  • Program arguments: -Dfile.encoding=utf8 and -Dsun.jnu.encoding=utf8
  • Using java.nio.file.Path instead of java.io.File
like image 26
Stefan A Avatar answered Oct 20 '22 05:10

Stefan A