I'm trying to read UTF-8 from a text file and do some tokenization, but I'm having issues with the encoding:
try {
fis = new FileInputStream(fName);
} catch (FileNotFoundException ex) {
//...
}
DataInputStream myInput = new DataInputStream(fis);
try {
while (thisLine = myInput.readLine()) != null) {
StringTokenizer st = new StringTokenizer(thisLine, ";");
while (st.hasMoreElements()) {
// do something with st.nextToken();
}
}
} catch (Exception e) {
//...
}
and DataInputStream doesn't have any parameters to set the encoding!
Let me quote the Javadoc for this method.
DataInputStream.readLine()
Deprecated. This method does not properly convert bytes to characters. As of JDK 1.1, the preferred way to read lines of text is via the BufferedReader.readLine() method. Programs that use the DataInputStream class to read lines can be converted to use the BufferedReader class by replacing code of the form:
DataInputStream d = new DataInputStream(in);
with:
BufferedReader d
= new BufferedReader(new InputStreamReader(in));
BTW: JDK 1.1 came out in Feb 1997 so this shouldn't be new to you.
Just think how much time everyone would have saved if you had read the Javadoc. ;)
You can use InputStreamReader:
BufferedReader br = new BufferedReader (new InputStreamReader (source, charset);
while (br.readLine () != null) { ... }
You can also try Scanner, but I'm not sure that it would work fine
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With