Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How make InputStreamReader fail on invalid data for encoding?

I have some bytes which should be UTF-8 encoded, but which may contain a text is ISO8859-1 encoding, if the user somehow didn't manage to use his text editor the right way.

I read the file with an InputStreamReader:

InputStreamReader reader = new InputStreamReader( 
    new FileInputStream(file), Charset.forName("UTF-8"));

But every time the user uses umlauts like "ä", which are invalid UTF-8 when stored in ISO8859-1 the InputStreamReader does not complain but adds placeholder characters.

Is there is simple way to make this throw an Exception on invalid input?

like image 525
Daniel Avatar asked Feb 05 '13 07:02

Daniel


2 Answers

CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
InputStreamReader reader = new InputStreamReader(
    new FileInputStream(file), decoder);
like image 78
Mikhail Vladimirov Avatar answered Oct 29 '22 03:10

Mikhail Vladimirov


Simply add .newDecoder():

InputStreamReader reader = new InputStreamReader( 
    new FileInputStream(file), Charset.forName("UTF-8").newDecoder());
like image 32
Esailija Avatar answered Oct 29 '22 01:10

Esailija