Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read write this in utf-8?

I was getting an error io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence

The solution is to read and write file in UTF-8.

My code is:

InputStream input = null;
OutputStream output = null;
OutputStreamWriter bufferedWriter = new OutputStreamWriter( output, "UTF8");
input = new URL(url).openStream();
output = new FileOutputStream("DirectionResponse.xml");
byte[] buffer = new byte[1024];
for (int length = 0; (length = input.read(buffer)) > 0;) {
   output.write(buffer, 0, length);
}
BufferedReader br = new BufferedReader(new FileReader("DirectionResponse.xml" ));
FileWriter fstream = new FileWriter("ppre_DirectionResponse.xml");
BufferedWriter out = new BufferedWriter(fstream);

I'm reading a url and writing it to a file DirectionResponse.xml. Then reading DirectionResponse.xml and writing the same as *ppre_DirecionResponse.xml* for processing.

How do I change this so that reading and writing is done in UTF-8?

like image 366
Gaurav Wadhwani Avatar asked Nov 12 '12 20:11

Gaurav Wadhwani


People also ask

How do I read a UTF-8 file?

In Java, the InputStreamReader accepts a charset to decode the byte streams into character streams. We can pass a StandardCharsets. UTF_8 into the InputStreamReader constructor to read data from a UTF-8 file.

How do I view UTF-8 in notepad?

Notepad can manage text encoded in several formats such as ANSI, Unicode and UTF-8. Find these options by clicking the "Encoding" button on Notepad's Save As window. After creating or updating text in a document, you can select one of these encoding options in which to save the file.


1 Answers

First, you need to call output.close() (or at least call output.flush() before you reopen the file for input. That's probably the main cause of your problems.

Then, you shouldn't use FileReader or FileWriter for this because it always uses the platform-default encoding (which is often not UTF-8). From the docs for FileReader:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

You have the same problem when using a FileWriter. Replace this:

BufferedReader br = new BufferedReader(new FileReader("DirectionResponse.xml" ));

with something like this:

BufferedReader br = new BufferedReader(new InputStreamReader(
    new FileInputStream("DirectionResponse.xml"), "UTF-8"));

and similarly for fstream.

like image 139
Ted Hopp Avatar answered Nov 07 '22 08:11

Ted Hopp