Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to clean HTML tags using C#

Tags:

For example:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>title</title> </head> <body>     <a href="aaa.asp?id=1"> I want to get this text </a>     <div>         <h1>this is my want!!</h1>         <b>this is my want!!!</b>     </div> </body> </html> 

and the result is:

 I want to get this text  this is my want!! this is my want!!! 
like image 519
guaike Avatar asked Jun 24 '09 13:06

guaike


People also ask

How do you clear a tag in HTML?

Select the HTML element which need to remove. Use JavaScript remove() and removeChild() method to remove the element from the HTML document.

Is it possible to remove the HTML tags from data?

Strip_tags() is a function that allows you to strip out all HTML and PHP tags from a given string (parameter one), however you can also use parameter two to specify a list of HTML tags you want.

How do I remove a tag from a string?

The HTML tags can be removed from a given string by using replaceAll() method of String class. We can remove the HTML tags from a given string by using a regular expression. After removing the HTML tags from a string, it will return a string as normal text.


2 Answers

HTML Agility Pack:

    HtmlDocument doc = new HtmlDocument();     doc.LoadHtml(html);     string s = doc.DocumentNode.SelectSingleNode("//body").InnerText; 
like image 119
Marc Gravell Avatar answered Oct 10 '22 04:10

Marc Gravell


Use this function...

public string Strip(string text) {     return Regex.Replace(text, @"<(.|\n)*?>", string.Empty); } 
like image 22
diegodsp Avatar answered Oct 10 '22 03:10

diegodsp