Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove illegal characters from a string of XML

Tags:

c#

xml

I have a string containing some XML. For example:

<foo>
    <bar>this is < than this</bar>
</foo>

and I need to remove the illagal characters from it before I load it into an XmlDocument.

any thoughts.

Thanks in advance

like image 522
mat-mcloughlin Avatar asked Nov 17 '25 05:11

mat-mcloughlin


1 Answers

I have a string containing some Xml.

No you don't. You have some XML-like text that is not well-formed. Once it's all glued together like that, it's hard work finding the special characters. Oh, you could try to look for "< " or " >", but those could appear anyway. My advice is to go back a step and look where that string came from. Change that code so it deals with special characters.

In the absence of any other options, I would probably ignore XML tools for the moment (because they'll throw up when you try to give them the string) and do some sort of running count of open/close (odd/even for quotes) on special characters. Once you've encountered an <, you aren't allowed another one until you meet a >, for example. Unfortunately you can't use < and the like in attributes, so I don't know what you'll do with <foo p1="a<a"> but at least you could fix <foo>a<A</foo>. (Assuming they would never put a < in a tag name, meeting the second one means you need to back up and escape the first one.) Once you've encountered a >, you can't have another one. And so on. My sympathies.

like image 116
Kate Gregory Avatar answered Nov 18 '25 21:11

Kate Gregory



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!