Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to parse out html from CDATA with C#

Tags:

c#

regex

cdata

I would like to parse out any HTML data that is returned wrapped in CDATA.

As an example <![CDATA[<table><tr><td>Approved</td></tr></table>]]>

Thanks!

like image 622
Little Larry Sellers Avatar asked Dec 07 '22 07:12

Little Larry Sellers


1 Answers

The expression to handle your example would be

\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>

Where the group "text" will contain your HTML.

The C# code you need is:

using System.Text.RegularExpressions;
RegexOptions   options = RegexOptions.None;
Regex          regex = new Regex(@"\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>", options);
string         input = @"<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";

// Check for match
bool   isMatch = regex.IsMatch(input);
if( isMatch )
  Match   match = regex.Match(input);
  string   HTMLtext = match.Groups["text"].Value;
end if

The "input" variable is in there just to use the sample input you provided

like image 84
Ron Harlev Avatar answered Dec 28 '22 19:12

Ron Harlev