Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An invalid XML character (Unicode: 0xc) was found

Parsing an XML file using the Java DOM parser results in:

[Fatal Error] os__flag_8c.xml:103:135: An invalid XML character (Unicode: 0xc) was found in the element content of the document. org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xc) was found in the element content of the document.     at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)     at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)     at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) 
like image 396
Ashish Avatar asked Apr 21 '11 10:04

Ashish


People also ask

What are illegal XML characters?

The only illegal characters are & , < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use &quot; here, ' is allowed" and attr='must use &apos; here, " is allowed' ). They're escaped using XML entities, in this case you want &amp; for & .


2 Answers

There are a few characters that are dissallowed in XML documents, even when you encapsulate data in CDATA-blocks.

If you generated the document you will need to entity encode it or strip it out. If you have an errorneous document, you should strip away these characters before trying to parse it.

See dolmens answer in this thread: Invalid Characters in XML

Where he links to this article: http://www.w3.org/TR/xml/#charsets

Basically, all characters below 0x20 is disallowed, except 0x9 (TAB), 0xA (CR?), 0xD (LF?)

like image 176
jishi Avatar answered Sep 26 '22 05:09

jishi


public String stripNonValidXMLCharacters(String in) {     StringBuffer out = new StringBuffer(); // Used to hold the output.     char current; // Used to reference the current character.      if (in == null || ("".equals(in))) return ""; // vacancy test.     for (int i = 0; i < in.length(); i++) {         current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.         if ((current == 0x9) ||             (current == 0xA) ||             (current == 0xD) ||             ((current >= 0x20) && (current <= 0xD7FF)) ||             ((current >= 0xE000) && (current <= 0xFFFD)) ||             ((current >= 0x10000) && (current <= 0x10FFFF)))             out.append(current);     }     return out.toString(); }     
like image 27
Dima Avatar answered Sep 25 '22 05:09

Dima