Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode(0xb) error while parsing an XML file using Stax

While parsing an XML file Stax produces an error:

Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the document.

Just click on the link below with the xml line with special character as "VI". It's not an alphabetical character: when you try to copy and paste it in Notepad, you will get it as some symbol. I have tried parsing it using Stax. It was showing the above-mentioned error.

enter image description here

Please can somebody give me a solution for this?

Thanks in advance.

like image 675
user1809124 Avatar asked Jan 07 '13 08:01

user1809124


People also ask

Why can't I parse 0XB in XML?

0xB (vertical tab) is not a valid character in XML. The only valid characters before ASCII 32 (0x20, space) are 0x9 (tab), 0xA (carriage return) and 0xD (line feed). In short, what you are trying to parse is NOT XML. The "proper solution" is to go to the people who write/supply the software and get them to fix it. They're not generating XML.

Is 0XB (vertical tab) a valid character in XML?

It's not an alphabetical character: when you try to copy and paste it in Notepad, you will get it as some symbol. I have tried parsing it using Stax. It was showing the above-mentioned error. Please can somebody give me a solution for this? Thanks in advance. 0xB (vertical tab) is not a valid character in XML.

How do I parse an XML file in Stax?

In StAX, any start tag or end tag is an event. XMLEventReader reads an XML file as a stream of events. It also provides the methods necessary to parse the XML. The most important methods are: isStartElement (): checks if the current event is a StartElement (start tag) isEndElement (): checks if the current event is an EndElement (end tag)

What is the exception 0x0 in xmlstreamexception?

XMLStreamException: An invalid XML character (Unicode: 0x0) was found in the element content of the document.


2 Answers

0xB (vertical tab) is not a valid character in XML. The only valid characters before ASCII 32 (0x20, space) are 0x9 (tab), 0xA (carriage return) and 0xD (line feed).

In short, what you are trying to parse is NOT XML.

like image 134
dty Avatar answered Nov 02 '22 23:11

dty


Whenever invalid xml character comes xml, it gives such error. When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. I m using xml version 1.0 and i validate text data before entering it in database by pattern

Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+");
retunContent = p.matcher(retunContent).replaceAll("");

It will ensure that no invalid special char will enter in xml

like image 24
Komal Avatar answered Nov 03 '22 00:11

Komal