Does anyone have a simple, efficient way of checking that a string doesn't contain HTML? Basically, I want to check that certain fields only contain plain text. I thought about looking for the < character, but that can easily be used in plain text. Another way might be to create a new System.Xml.Linq.XElement using:
XElement.Parse("<wrapper>" + MyString + "</wrapper>")
and check that the XElement contains no child elements, but this seems a little heavyweight for what I need.
We use <\/?[a-z][\s\S]*> to check for any tags in the string to check if it has any HTML markup.
The World Wide Web Consortium provide a simple online tool (https://validator.w3.org/) that automatically check your HTML code and point out any problems/errors your code might have, such as missing closing tags or missing quotes around attributes.
The valid HTML tag must satisfy the following conditions: It should start with an opening tag (<). It should be followed by a double quotes string or single quotes string. It should not allow one double quotes string, one single quotes string or a closing tag (>) without single or double quotes enclosed.
The following will match any matching set of tags. i.e. <b>this</b>
Regex tagRegex = new Regex(@"<\s*([^ >]+)[^>]*>.*?<\s*/\s*\1\s*>");
The following will match any single tag. i.e. <b> (it doesn't have to be closed).
Regex tagRegex = new Regex(@"<[^>]+>");
You can then use it like so
bool hasTags = tagRegex.IsMatch(myString);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With