Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove only some html tags on c#

Tags:

html

c#

I have a string:

string hmtl = "<DIV><B> xpto </B></DIV>

and need to remove the tags of <div> and </DIV>. With a result of : <B> xpto </B>


Just <DIV> and </DIV> without the removal of a lot of html tags, but save the <B> xpto </B>.

like image 670
r-magalhaes Avatar asked Dec 12 '22 21:12

r-magalhaes


2 Answers

Use htmlagilitypack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<html>yourHtml</html>");

foreach(var item in doc.DocumentNode.SelectNodes("//div"))// "//div" is a xpath which means select div nodes that are anywhere in the html
{
 item.InnerHtml;//your div content
}

If you want only B tags..

foreach(var item in doc.DocumentNode.SelectNodes("//B"))
    {
     item.OuterHtml;//your B tag and its content
    }
like image 196
Anirudha Avatar answered Dec 28 '22 07:12

Anirudha


If you are just removing div tags, this will get div tags as well as any attributes they may have.

var html = 
  "<DIV><B> xpto <div text='abc'/></B></DIV><b>Other text <div>test</div>" 

var pattern = "@"(\</?DIV(.*?)/?\>)"";  

// Replace any match with nothing/empty string
Regex.Replace(html, pattern, string.Empty, RegexOptions.IgnoreCase);

Result

<B> xpto </B><b>Other text test
like image 29
ΩmegaMan Avatar answered Dec 28 '22 08:12

ΩmegaMan