Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java read utf-8 encoded file, character by character

Tags:

java

I have a file saved as utf-8 (saved by my application in fact). How do you read it character by character?

File file = new File(folder+name);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream dis = new DataInputStream(bis);

The two options seem to be:

char c = dis.readByte()
char c = dis.readChar()
  • The first option works as long as you only have ascii characters stored, ie english.
  • The second option reads the first and second byte of the file as one character.

The original file is being written as follows:

File file = File.createTempFile("file", "txt");
FileWriter fstream = new FileWriter(file);
BufferedWriter out = new BufferedWriter(fstream);
like image 268
corydoras Avatar asked Mar 21 '26 06:03

corydoras


2 Answers

You don't want a DataInputStream, that's for reading raw bytes. Use an InputStreamReader, which lets you specify the encoding of the input (UTF-8 in your case).

like image 74
dmazzoni Avatar answered Mar 22 '26 20:03

dmazzoni


You should be aware that in the Java world you use streams to process bytes, and readers/writers to process characters. These two are not the same, and you should choose the right one to handle what you have.

Have a look at http://java.sun.com/docs/books/tutorial/i18n/text/stream.html to see how to work with characters in a byte-oriented world.

The Sun Java Tutorial is a highly recommended learning resource.

like image 22
Thorbjørn Ravn Andersen Avatar answered Mar 22 '26 21:03

Thorbjørn Ravn Andersen