Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to parse xml in .net

I have the following function that I am using to remove the characters \04 and nulls from my xmlString but I can't find what do I need to change to avoid removing the \ from my ending tags. This is what I get when I run this function

<ARR>20080625<ARR><DEP>20110606<DEP><PCIID>626783<PCIID><NOPAX>1<NOPAX><TG><TG><HASPREV>FALSE<HASPREV><HASSUCC>FALSE<HASSUCC>

Can anybody help me find out what do I need to change in my expression to keep the ending tag as </tag>

Private Function CleanInput(ByVal inputXML As String) As String
    ' Note - This will perform better if you compile the Regex and use a reference to it.
    ' That assumes it will still be memory-resident the next time it is invoked.
    ' Replace invalid characters with empty strings.
    Return Regex.Replace(inputXML, "[^><\w\.@-]", "")
End Function
like image 241
Tony Avatar asked Mar 24 '10 16:03

Tony


People also ask

Can you parse XML with regex?

The syntax of XML is simple enough that it is possible to parse an XML document into a list of its markup and text items using a single regular expression.

Can you use regex in C#?

In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The . Net Framework provides a regular expression engine that allows the pattern matching.

What regex does .NET use?

In . NET, regular expression patterns are defined by a special syntax or language, which is compatible with Perl 5 regular expressions and adds some additional features such as right-to-left matching. For more information, see Regular Expression Language - Quick Reference.

What is XML parser in asp net?

XML parser is a software library or a package that provides interface for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML documents.


1 Answers

Private Function CleanInput(ByVal inputXML As String) As String
    Return Regex.Replace(inputXML, "[^/><\w\.@-]", "")
    ' --------------------------------^
End Function

But since your target is only removing the \04 and \00's it's safer to restrict the replacement on them only.

Private Function CleanInput(ByVal inputXML As String) As String
    Return Regex.Replace(inputXML, "[\4\0]", "")
End Function
like image 126
kennytm Avatar answered Oct 14 '22 09:10

kennytm