I'm working on a 1 Gigabyte JSON text file which I'm trying to parse using Java. However, the parser throws an exception because it runs into the character 'ñ' generating this exception:
Exception Invalid UTF-8 start byte 0x96
I've tried to remove the character using sed and perl, but it seems that they cannot read the character and thus the file remains unchanged. I'd like to remove the character from the whole file or replace it with any other character or string so that the parsing works.
Your file is not encoded in UTF-8.
You should find the encoding and use this encoding to read the File using InputStreamReader
. And then save it if needed in UTF-8 (using for exemple an OutputStreamWriter
).
If you don't know the encoding, I suggest you test with a few probable encodings : see Charsets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With