Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate that a string doesn't contain HTML using C#

Does anyone have a simple, efficient way of checking that a string doesn't contain HTML? Basically, I want to check that certain fields only contain plain text. I thought about looking for the < character, but that can easily be used in plain text. Another way might be to create a new System.Xml.Linq.XElement using:

XElement.Parse("<wrapper>" + MyString + "</wrapper>") 

and check that the XElement contains no child elements, but this seems a little heavyweight for what I need.

like image 868
Ben Mills Avatar asked Oct 15 '08 13:10

Ben Mills


People also ask

How do I check if a string contains HTML?

We use <\/?[a-z][\s\S]*> to check for any tags in the string to check if it has any HTML markup.

How do I check if HTML is valid?

The World Wide Web Consortium provide a simple online tool (https://validator.w3.org/) that automatically check your HTML code and point out any problems/errors your code might have, such as missing closing tags or missing quotes around attributes.

Is a valid HTML tag?

The valid HTML tag must satisfy the following conditions: It should start with an opening tag (<). It should be followed by a double quotes string or single quotes string. It should not allow one double quotes string, one single quotes string or a closing tag (>) without single or double quotes enclosed.


1 Answers

The following will match any matching set of tags. i.e. <b>this</b>

Regex tagRegex = new Regex(@"<\s*([^ >]+)[^>]*>.*?<\s*/\s*\1\s*>"); 

The following will match any single tag. i.e. <b> (it doesn't have to be closed).

Regex tagRegex = new Regex(@"<[^>]+>"); 

You can then use it like so

bool hasTags = tagRegex.IsMatch(myString); 
like image 101
ICR Avatar answered Oct 04 '22 18:10

ICR