Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ISO 8859-1 Encoding of files printed in Java program

I write a program that implements a file structure, the program prints out a product file based on the structure. Product names include letters Æ, Ø and Å. These letters are not displayed correctly in the output file. I use

PrintWriter printer = new PrintWriter(new FileOutputStream(new File("products.txt")));

IS0 8859 - 1 or Windows ANSI (CP 1252) is the character sets that the implementation requiers.

like image 463
user265767 Avatar asked Sep 08 '11 00:09

user265767


1 Answers

There are two possibilities:

  • Java is using the wrong encoding when outputting the file.
  • The file is actually correct, and whatever you are using to display the file is using the wrong encoding.

Assuming that the problem is the first one, the root cause is that Java has figured out that the default encoding for the platform is something other than the one you want / expect. There are three ways to solve this:

  • Figure out why Java has the got default locale and encoding "wrong" and remedy that. It will be something to do with your operating system's locale settings ...

  • Read this FAQ for details on how you can override the default locale settings at the command line.

  • Use a PrintWriter constructor that specifies the encoding explicitly so that your application doesn't rely on the default encoding. For example:

    PrintWriter pw = new PrintWriter("filename", "ISO-8859-1");
    

In response to this comment:

Don’t PrintWriters all have the bug that you can’t know you had an error with them?

  • It is not a bug, it is a design feature.
  • You can find out if there was an error. You just can't find out what it was.
  • If you don't like it, you can use Writer instead.

They won’t raise an exception or even return failure if you try to shove a codepoint at them that can’t fit in the designated encoding.

Neither will a regular Writer I believe ... unless you specifically construct it to do this. The normal behaviour is to replace any unmappable codepoint with a specific character, though this is not specified in the javadocs (IIRC).

Do they even tell if you the filesystem fills up; I seem to recall that they don’t.

That is true. However:

  • For the kind of file you typically write using a PrintWriter this is not a critical issue.

  • If it is a critical issue AND you still want to use PrintWriter, you can always call checkError() (IIRC) to find out if there was an error.

I always end up writing my out OutputStreamWriter constructor with the explicit Charset.forName("UTF-8").newEncoder() second argument. It’s kind of tedious, so perhaps there’s a better way.

I dunno.

like image 158
Stephen C Avatar answered Sep 22 '22 22:09

Stephen C