Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java reads a weird character at the beginning of the file which doesn't exist

Tags:

java

xml

jaxb

I have a simple xml file on my hard drive. When I open it with notepad++ this is what I see:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>
... more stuff here ...
</content>

But when I read it using a FileInputStream I get:

?<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>...

I'm using JAXB to parse xml's and it throws an exception of "content not allowed in prolog" because of that "?" sign.

What is this extra "?" sign? why is it there and how do I get rid of it?

like image 850
samz Avatar asked Dec 27 '22 06:12

samz


2 Answers

That extra character is a byte order mark, a special Unicode character code which lets the XML parser know what the byte order (little endian or big endian) of the bytes in the file is.

Normally, your XML parser should be able to understand this. (If it doesn't, I would regard that a bug in the XML parser).

As a workaround, make sure that the program that produces this XML leaves off the BOM.

like image 147
Jesper Avatar answered Feb 02 '23 12:02

Jesper


Check the encoding of the file, I've seen a similar thing, openeing the file in most editors and it looked fine, turned out it was encoded with UTF-8 without BOM (or with, I can't recall off the top of my head). Notepad++ should be ok to switch between the two.

like image 29
Daniel Morritt Avatar answered Feb 02 '23 12:02

Daniel Morritt