Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escape invalid XML characters in C#

I have a string that contains invalid XML characters. How can I escape (or remove) invalid XML characters before I parse the string?

like image 997
Alireza Noori Avatar asked Nov 30 '11 18:11

Alireza Noori


People also ask

What characters are invalid in XML?

The only illegal characters are & , < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use &quot; here, ' is allowed" and attr='must use &apos; here, " is allowed' ). They're escaped using XML entities, in this case you want &amp; for & .

How do you escape an XML character?

XML escape characters There are only five: " &quot; ' &apos; < &lt; > &gt; & &amp; Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.

How do I find an invalid character in XML?

If you're unable to identify this character visually, then you can use a text editor such as TextPad to view your source file. Within the application, use the Find function and select "hex" and search for the character mentioned. Removing these characters from your source file resolve the invalid XML character issue.


3 Answers

As the way to remove invalid XML characters I suggest you to use XmlConvert.IsXmlChar method. It was added since .NET Framework 4 and is presented in Silverlight too. Here is the small sample:

void Main() {     string content = "\v\f\0";     Console.WriteLine(IsValidXmlString(content)); // False      content = RemoveInvalidXmlChars(content);     Console.WriteLine(IsValidXmlString(content)); // True }  static string RemoveInvalidXmlChars(string text) {     var validXmlChars = text.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray();     return new string(validXmlChars); }  static bool IsValidXmlString(string text) {     try {         XmlConvert.VerifyXmlChars(text);         return true;     } catch {         return false;     } } 

And as the way to escape invalid XML characters I suggest you to use XmlConvert.EncodeName method. Here is the small sample:

void Main() {     const string content = "\v\f\0";     Console.WriteLine(IsValidXmlString(content)); // False      string encoded = XmlConvert.EncodeName(content);     Console.WriteLine(IsValidXmlString(encoded)); // True      string decoded = XmlConvert.DecodeName(encoded);     Console.WriteLine(content == decoded); // True }  static bool IsValidXmlString(string text) {     try {         XmlConvert.VerifyXmlChars(text);         return true;     } catch {         return false;     } } 

Update: It should be mentioned that the encoding operation produces a string with a length which is greater or equal than a length of a source string. It might be important when you store a encoded string in a database in a string column with length limitation and validate source string length in your app to fit data column limitation.

like image 88
Igor Kustov Avatar answered Oct 26 '22 05:10

Igor Kustov


Use SecurityElement.Escape

using System; using System.Security;  class Sample {   static void Main() {     string text = "Escape characters : < > & \" \'";     string xmlText = SecurityElement.Escape(text); //output: //Escape characters : &lt; &gt; &amp; &quot; &apos;     Console.WriteLine(xmlText);   } } 
like image 39
BLUEPIXY Avatar answered Oct 26 '22 05:10

BLUEPIXY


If you are writing xml, just use the classes provided by the framework to create the xml. You won't have to bother with escaping or anything.

Console.Write(new XElement("Data", "< > &"));

Will output

<Data>&lt; &gt; &amp;</Data>

If you need to read an XML file that is malformed, do not use regular expression. Instead, use the Html Agility Pack.

like image 29
Pierre-Alain Vigeant Avatar answered Oct 26 '22 03:10

Pierre-Alain Vigeant