Yes, you need to specify the encoding of the file you want to read.
Yes, this means that you have to know the encoding of the file you want to read.
No, there is no general way to guess the encoding of any given "plain text" file.
The one-arguments constructors of FileReader
always use the platform default encoding which is generally a bad idea.
Since Java 11 FileReader
has also gained constructors that accept an encoding: new FileReader(file, charset)
and new FileReader(fileName, charset)
.
In earlier versions of java, you need to use new InputStreamReader(
new FileInputStream(pathToFile)
, <encoding>)
.
FileReader
uses Java's platform default encoding, which depends on the system settings of the computer it's running on and is generally the most popular encoding among users in that locale.
If this "best guess" is not correct then you have to specify the encoding explicitly. Unfortunately, FileReader
does not allow this (major oversight in the API). Instead, you have to use new InputStreamReader(new FileInputStream(filePath), encoding)
and ideally get the encoding from metadata about the file.
For Java 7+ doc you can use this:
BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
Here are all Charsets doc
For example if your file is in CP1252, use this method
Charset.forName("windows-1252");
Here is other canonical names for Java encodings both for IO and NIO doc
If you do not know with exactly encoding you have got in a file, you may use some third-party libs like this tool from Google this which works fairly neat.
Since Java 11 you may use that:
public FileReader(String fileName, Charset charset) throws IOException;
FileInputStream with InputStreamReader is better than directly using FileReader, because the latter doesn't allow you to specify encoding charset.
Here is an example using BufferedReader, FileInputStream and InputStreamReader together, so that you could read lines from a file.
List<String> words = new ArrayList<>();
List<String> meanings = new ArrayList<>();
public void readAll( ) throws IOException{
String fileName = "College_Grade4.txt";
String charset = "UTF-8";
BufferedReader reader = new BufferedReader(
new InputStreamReader(
new FileInputStream(fileName), charset));
String line;
while ((line = reader.readLine()) != null) {
line = line.trim();
if( line.length() == 0 ) continue;
int idx = line.indexOf("\t");
words.add( line.substring(0, idx ));
meanings.add( line.substring(idx+1));
}
reader.close();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With