Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Invalid Characters from XML Name Tag - RegEx C#

Tags:

c#

regex

xml

I have a string with xml data that I pulled from a web service. The data is ugly and has some invalid chars in the Name tags of the xml. For example, I may see something like:

<Author>Scott the Coder</Author><Address#>My address</Address#>

The # in the Address name field is invalid. I am looking for a regular expression that will remove all the invalid chars from the name tags BUT leave all the chars in the Value section of the xml. In other words, I want to use RegEx to remvove chars only from the opening name tags and closing name tags. Everything else should remaing the same.

I don't have all the invalid chars yet, but this will get me started: #{}&()

Is it possible to do what I am trying to do?

like image 954
Scott Avatar asked Jan 24 '11 04:01

Scott


1 Answers

If your intention is to only check validity of a name for a Xml node, I suggest you to take a look at the XmlConvert class; especially the VerifyName and VerifyNCName methods.

Also note that with that class, you could accept any text as node name using the EncodeName and EncodeLocalName methods.

Using those methods will be far easier, safe and faster than performing a Regular Expression.

like image 58
Sam B Avatar answered Sep 17 '22 18:09

Sam B