Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with invalid XML hexadecimal characters

Tags:

I'm trying to send an XML document over the wire but receiving the following exception:

"MY LONG EMAIL STRING" was specified for the 'Body' element. ---> System.ArgumentException: '', hexadecimal value 0x02, is an invalid character.
   at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
   at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
   at System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
   at System.Xml.XmlRawWriter.WriteValue(String value)
   at System.Xml.XmlWellFormedWriter.WriteValue(String value)
   at Microsoft.Exchange.WebServices.Data.EwsServiceXmlWriter.WriteValue(String value, String name)
   --- End of inner exception stack trace ---

I don't have any control over what I attempt to send because the string is gathered from an email. How can I encode my string so it's valid XML while keeping the illegal characters?

I'd like to keep the original characters one way or another.

like image 693
gcso Avatar asked Nov 17 '11 16:11

gcso


People also ask

What characters are not valid in XML?

The only illegal characters are & , < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use &quot; here, ' is allowed" and attr='must use &apos; here, " is allowed' ). They're escaped using XML entities, in this case you want &amp; for & .

How do I find an invalid character in XML?

If you're unable to identify this character visually, then you can use a text editor such as TextPad to view your source file. Within the application, use the Find function and select "hex" and search for the character mentioned. Removing these characters from your source file resolve the invalid XML character issue.

What is hexadecimal value 0x00?

Hexadecimal value 0x00 is a invalid character, line 1, position nnnnn. But its not consistent. Sometimes some 'blank' data will work. The 'faulty' data works on some PCs, but not others. In the database, the data is always a blank string.

Why & is not allowed in XML?

Note that the ampersand (&) and less-than (<) characters are not permitted in XML attribute values. Since XFDL computes appear in a compute attribute, these must be escaped with character or entity references (e.g. the entity references &amp; for the ampersand and &lt; for the less-than character).


1 Answers

The following code removes XML invalid characters from a string and returns a new string without them:

public static string CleanInvalidXmlChars(string text) 
{ 
     // From xml spec valid chars: 
     // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]     
     // any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. 
     string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]"; 
     return Regex.Replace(text, re, ""); 
}
like image 54
mathifonseca Avatar answered Oct 13 '22 22:10

mathifonseca