I have a simple xml file on my hard drive. When I open it with notepad++ this is what I see:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>
... more stuff here ...
</content>
But when I read it using a FileInputStream
I get:
?<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>...
I'm using JAXB to parse xml's and it throws an exception of "content not allowed in prolog" because of that "?" sign.
What is this extra "?" sign? why is it there and how do I get rid of it?
That extra character is a byte order mark, a special Unicode character code which lets the XML parser know what the byte order (little endian or big endian) of the bytes in the file is.
Normally, your XML parser should be able to understand this. (If it doesn't, I would regard that a bug in the XML parser).
As a workaround, make sure that the program that produces this XML leaves off the BOM.
Check the encoding of the file, I've seen a similar thing, openeing the file in most editors and it looked fine, turned out it was encoded with UTF-8 without BOM (or with, I can't recall off the top of my head). Notepad++ should be ok to switch between the two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With