Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Exception: Invalid Character(s)

I am working on a small project that is receiving XML data in string form from a long running application. I am trying to load this string data into an XDocument (System.Xml.Linq.XDocument), and then from there do some XML Magic and create an xlsx file for a report on the data.

On occasion, I receive the data that has invalid XML characters, and when trying to parse the string into an XDocument, I get this error.

[System.Xml.XmlException] Message: '?', hexadecimal value 0x1C, is an invalid character.

Since I have no control over the remote application, you could expect ANY kind of character.

I am well aware that XML has a way where you can put characters in it such as &#x1C or something like that.

If at all possible I would SERIOUSLY like to keep ALL the data. If not, than let it be.


I have thought about editing the response string programatically, then going back and trying to re-parse should an exception be thrown, but I have tried a few methods and none of them seem successful.

Thank you for your thought.

Code is something along the line of this:

TextReader  tr;
XDocument  doc;

string           response; //XML string received from server. 
... 
tr = new StringReader (response);   

try
{
    doc = XDocument.Load(tr);
}
catch (XmlException e)
{
    //handle here?
}
like image 909
Meiscooldude Avatar asked May 12 '09 19:05

Meiscooldude


People also ask

What characters are invalid for XML?

The only illegal characters are & , < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use &quot; here, ' is allowed" and attr='must use &apos; here, " is allowed' ). They're escaped using XML entities, in this case you want &amp; for & .

How do I find an invalid character in XML?

If you're unable to identify this character visually, then you can use a text editor such as TextPad to view your source file. Within the application, use the Find function and select "hex" and search for the character mentioned. Removing these characters from your source file resolve the invalid XML character issue.

Why is my XML invalid?

You can get an 'invalid XML error' message if you have altered or edited the XML file generated from the ITR utility/form before uploading the same to the portal. Getty Images Generate the XML after filling all necessary details and upload the new XML file in the e-filing portal.

How do I allow special characters in XML?

To include special characters inside XML files you must use the numeric character reference instead of that character. The numeric character reference must be UTF-8 because the supported encoding for XML files is defined in the prolog as encoding="UTF-8" and should not be changed.


3 Answers

You can use the XmlReader and set the XmlReaderSettings.CheckCharacters property to false. This will let you to read the XML file despite the invalid characters. From there you can import pass it to a XmlDocument or XDocument object.

You can read a little more about in my blog.

To load the data to a System.Xml.Linq.XDocument it will look a little something like this:

XDocument xDocument = null;
XmlReaderSettings xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false };
using (XmlReader xmlReader = XmlReader.Create(filename, xmlReaderSettings))
{
    xmlReader.MoveToContent();
    xDocument = XDocument.Load(xmlReader);
}

More information can be found here.

like image 142
paulselles Avatar answered Sep 28 '22 08:09

paulselles


XML can handle just about any character, but there are ranges, control codes and such, that it won't.

Your best bet, if you can't get them to fix their output, is to sanitize the raw data you're receiving. You need replace illegal characters with the character reference format you noted.

(You can't even resort to CDATA, as there is no way to escape these characters there.)

like image 30
great_llama Avatar answered Sep 28 '22 08:09

great_llama


Would something as described in this blog post be helpful?

Basically, he creates a sanitizing xml stream.

like image 31
Richard Morgan Avatar answered Sep 28 '22 07:09

Richard Morgan