Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently removing UTF Byte order Mark [duplicate]

I am looking for an efficient solution to the following problem:

org.xml.sax.SAXParseException: Content is not allowed in prolog

The problem is skipping (or removing) the first 3 bom bytes (if present) before unmarshalling the file (using jaxb).

I can get it to work by checking the first three bytes and then writing everything after that to a new file and using the new file, however this seems horribly inefficient.

I have tried moving the file pointer over 3 bytes if the BOM is present (and verified the pointer position ofc.) , however when I pass the inputstream to jaxb it still throws the same exception; my gut instinct being that the file pointer is being reset.

Does anyone have any ideas for this?

Thanks

like image 753
Jay Mie Avatar asked Jun 09 '26 21:06

Jay Mie


1 Answers

Use a InputStream decoractor that strips the BOM such as BOMInputStream from Apache Commons IO.

like image 186
Dev Avatar answered Jun 11 '26 12:06

Dev