Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write a file in UTF-8 using FileWriter (Java)?

I have the following code however, I want it to write as a UTF-8 file to handle foreign characters. Is there a way of doing this, is there some need to have a parameter?

I would really appreciate your help with this. Thanks.

try {   BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list"));   writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv"));   while( (line = reader.readLine()) != null) {     //If the line starts with a tab then we just want to add a movie     //using the current actor's name.     if(line.length() == 0)       continue;     else if(line.charAt(0) == '\t') {       readMovieLine2(0, line, surname.toString(), forename.toString());     } //Else we've reached a new actor     else {       readActorName(line);     }   } } catch (IOException e) {   e.printStackTrace(); } 
like image 598
user1280970 Avatar asked Mar 24 '12 15:03

user1280970


People also ask

How do I convert to UTF-8 in Java?

In order to convert Unicode to UTF-8 in Java, we use the getBytes() method. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. Declaration - The getBytes() method is declared as follows.

How do you write UTF in Java?

The readUTF() and writeUTF() methods in Java It provides 3 types of encodings. UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width. UTF-16-8 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width.


2 Answers

Safe Encoding Constructors

Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader and OutputStreamWriter to receive a proper exception on an encoding glitch.

For file I/O, always make sure to always use as the second argument to both OutputStreamWriter and InputStreamReader the fancy encoder argument:

  Charset.forName("UTF-8").newEncoder() 

There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:

 OutputStreamWriter char_output = new OutputStreamWriter(      new FileOutputStream("some_output.utf8"),      Charset.forName("UTF-8").newEncoder()   );   InputStreamReader char_input = new InputStreamReader(      new FileInputStream("some_input.utf8"),      Charset.forName("UTF-8").newDecoder()   ); 

As for running with

 $ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere 

The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.

Longer Example

Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:

 // this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams  Process  slave_process = Runtime.getRuntime().exec("perl -CS script args");   // fetch his stdin byte stream...  OutputStream  __bytes_into_his_stdin  = slave_process.getOutputStream();   // and make a character stream with exceptions on encoding errors  OutputStreamWriter    chars_into_his_stdin  = new OutputStreamWriter(                              __bytes_into_his_stdin,          /* DO NOT OMIT! */  Charset.forName("UTF-8").newEncoder()                          );   // fetch his stdout byte stream...  InputStream  __bytes_from_his_stdout = slave_process.getInputStream();   // and make a character stream with exceptions on encoding errors  InputStreamReader    chars_from_his_stdout = new InputStreamReader(                              __bytes_from_his_stdout,          /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()                          );  // fetch his stderr byte stream...  InputStream  __bytes_from_his_stderr = slave_process.getErrorStream();   // and make a character stream with exceptions on encoding errors  InputStreamReader    chars_from_his_stderr = new InputStreamReader(                              __bytes_from_his_stderr,          /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()                          ); 

Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin, chars_from_his_stdout, and chars_from_his_stderr.

This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.

Just don’t get me started about PrintStreams eating exceptions.

like image 125
tchrist Avatar answered Sep 19 '22 18:09

tchrist


Ditch FileWriter and FileReader, which are useless exactly because they do not allow you to specify the encoding. Instead, use

new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)

and

new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);

like image 23
Michael Borgwardt Avatar answered Sep 18 '22 18:09

Michael Borgwardt