Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XML which contains illegal characters

Tags:

c#

xml

A message I receive from a server contains tags and in the tags is the data I need.

I try to parse the payload as XML but illegal character exceptions are generated.

I also made use of httpUtility and Security Utility to escape the illegal characters, only problem is, it will escape < > which is needed to parse the XML.

My question is, how do I parse XML when the data contained in it contains illegal non XML characters? (& -> amp;)_

Thanks.

Example:

<item><code>1234</code><title>voi hoody & polo shirt + Mckenzie jumper</title><description>Good condition size small - medium, text me if interested</description></item>
like image 802
mitchellt Avatar asked Jan 10 '23 19:01

mitchellt


1 Answers

If you have only & as invalid character, then you can use regex to replace it with &amp;. We use regex to prevent replacement of already existing &amp;, &quot;, &#111;, etc. symbols.

Regex can be as follows:

&(?!(?:lt|gt|amp|apos|quot|#\d+|#x[a-f\d]+);)

Regular expression visualization

Sample code:

string content = @"<item><code>1234 &amp; test</code><title>voi hoody & polo shirt + Mckenzie jumper&other stuff</title><description>Good condition size small - medium, text me if interested</description></item>";
content = Regex.Replace(content, @"&(?!(?:lt|gt|amp|apos|quot|#\d+|#x[a-f\d]+);)", "&amp;", RegexOptions.IgnoreCase);
XElement xItem = XElement.Parse(content);
like image 143
Ulugbek Umirov Avatar answered Jan 19 '23 16:01

Ulugbek Umirov