I have some bytes which should be UTF-8 encoded, but which may contain a text is ISO8859-1 encoding, if the user somehow didn't manage to use his text editor the right way.
I read the file with an InputStreamReader:
InputStreamReader reader = new InputStreamReader(
new FileInputStream(file), Charset.forName("UTF-8"));
But every time the user uses umlauts like "ä", which are invalid UTF-8 when stored in ISO8859-1 the InputStreamReader does not complain but adds placeholder characters.
Is there is simple way to make this throw an Exception on invalid input?
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
InputStreamReader reader = new InputStreamReader(
new FileInputStream(file), decoder);
Simply add .newDecoder()
:
InputStreamReader reader = new InputStreamReader(
new FileInputStream(file), Charset.forName("UTF-8").newDecoder());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With