Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTMLagilitypack is not removing all html tags How can I solve this efficiently?

I am using following method to strip all html from the string:

public static string StripHtmlTags(string html)
        {
            if (String.IsNullOrEmpty(html)) return "";
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(html);
            return doc.DocumentNode.InnerText;
        }

But it seems ignoring this following tag: […]

So the string returns basicly:

> A hungry thief who stole a rack of pork ribs from a grocery store has
> been sentenced to spend 50 years in prison. Willie Smith Ward felt the
> full force of the law after being convicted of the crime in Waco,
> Texas, on Wednesday. The 43-year-old may feel slightly aggrieved over
> the severity of the […]

How can I make sure that these kind of tags gets stripped?

Any kind of help is appreciated, thanks.

like image 976
Obsivus Avatar asked Jun 01 '13 17:06

Obsivus


1 Answers

Try HttpUtility.HtmlDecode

public static string StripHtmlTags(string html)
{
    if (String.IsNullOrEmpty(html)) return "";
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    return HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);
}

HtmlDecode will convert […] to […]

like image 163
Damith Avatar answered Oct 02 '22 18:10

Damith