Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"?

I'm creating a simple wordcount program in Java that reads through a directory's text-based files.

However, I keep on getting the error:

java.nio.charset.MalformedInputException: Input length = 1

from this line of code:

BufferedReader reader = Files.newBufferedReader(file,Charset.forName("UTF-8"));

I know I probably get this because I used a Charset that didn't include some of the characters in the text files, some of which included characters of other languages. But I want to include those characters.

I later learned at the JavaDocs that the Charset is optional and only used for a more efficient reading of the files, so I changed the code to:

BufferedReader reader = Files.newBufferedReader(file);

But some files still throw the MalformedInputException. I don't know why.

I was wondering if there is an all-inclusive Charset that will allow me to read text files with many different types of characters?

Thanks.

like image 431
Jonathan Lam Avatar asked Oct 08 '14 23:10

Jonathan Lam


2 Answers

You probably want to have a list of supported encodings. For each file, try each encoding in turn, maybe starting with UTF-8. Every time you catch the MalformedInputException, try the next encoding.

like image 122
Dawood ibn Kareem Avatar answered Oct 23 '22 01:10

Dawood ibn Kareem


Creating BufferedReader from Files.newBufferedReader

Files.newBufferedReader(Paths.get("a.txt"), StandardCharsets.UTF_8);

when running the application it may throw the following exception:

java.nio.charset.MalformedInputException: Input length = 1

But

new BufferedReader(new InputStreamReader(new FileInputStream("a.txt"),"utf-8"));

works well.

The different is that, the former uses CharsetDecoder default action.

The default action for malformed-input and unmappable-character errors is to report them.

while the latter uses the REPLACE action.

cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)
like image 52
Xin Wang Avatar answered Oct 23 '22 01:10

Xin Wang