Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HtmlAgilityPack and HtmlDecode

I am currently using HtmlAgilityPack with a console application to scrape a website. Since the html is encoded (it returns encoded characters like ') I have to decode before I save the content to my database.

Is there a way to decode the returned html using HtmlAgilityPack without having to use HttpUtility.HtmlDecode? I want to avoid adding System.Web to my console application if possible.

like image 630
Thomas Avatar asked Jul 12 '11 14:07

Thomas


1 Answers

The Html Agility Pack is equiped with a utility class called HtmlEntity. It has a static method with the following signature:

/// <summary> /// Replace known entities by characters. /// </summary> /// <param name="text">The source text.</param> /// <returns>The result text.</returns> public static string DeEntitize(string text) 

It supports well-known entities (like &nbsp;) and encoded characters such as &#039; as well.

like image 174
Simon Mourier Avatar answered Sep 21 '22 17:09

Simon Mourier