I have a string that contains invalid XML characters. How can I escape (or remove) invalid XML characters before I parse the string?
The only illegal characters are & , < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use " here, ' is allowed" and attr='must use ' here, " is allowed' ). They're escaped using XML entities, in this case you want & for & .
XML escape characters There are only five: " " ' ' < < > > & & Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.
If you're unable to identify this character visually, then you can use a text editor such as TextPad to view your source file. Within the application, use the Find function and select "hex" and search for the character mentioned. Removing these characters from your source file resolve the invalid XML character issue.
As the way to remove invalid XML characters I suggest you to use XmlConvert.IsXmlChar method. It was added since .NET Framework 4 and is presented in Silverlight too. Here is the small sample:
void Main() { string content = "\v\f\0"; Console.WriteLine(IsValidXmlString(content)); // False content = RemoveInvalidXmlChars(content); Console.WriteLine(IsValidXmlString(content)); // True } static string RemoveInvalidXmlChars(string text) { var validXmlChars = text.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray(); return new string(validXmlChars); } static bool IsValidXmlString(string text) { try { XmlConvert.VerifyXmlChars(text); return true; } catch { return false; } }
And as the way to escape invalid XML characters I suggest you to use XmlConvert.EncodeName method. Here is the small sample:
void Main() { const string content = "\v\f\0"; Console.WriteLine(IsValidXmlString(content)); // False string encoded = XmlConvert.EncodeName(content); Console.WriteLine(IsValidXmlString(encoded)); // True string decoded = XmlConvert.DecodeName(encoded); Console.WriteLine(content == decoded); // True } static bool IsValidXmlString(string text) { try { XmlConvert.VerifyXmlChars(text); return true; } catch { return false; } }
Update: It should be mentioned that the encoding operation produces a string with a length which is greater or equal than a length of a source string. It might be important when you store a encoded string in a database in a string column with length limitation and validate source string length in your app to fit data column limitation.
Use SecurityElement.Escape
using System; using System.Security; class Sample { static void Main() { string text = "Escape characters : < > & \" \'"; string xmlText = SecurityElement.Escape(text); //output: //Escape characters : < > & " ' Console.WriteLine(xmlText); } }
If you are writing xml, just use the classes provided by the framework to create the xml. You won't have to bother with escaping or anything.
Console.Write(new XElement("Data", "< > &"));
Will output
<Data>< > &</Data>
If you need to read an XML file that is malformed, do not use regular expression. Instead, use the Html Agility Pack.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With