I am currently using HtmlAgilityPack with a console application to scrape a website. Since the html is encoded (it returns encoded characters like '
) I have to decode before I save the content to my database.
Is there a way to decode the returned html using HtmlAgilityPack without having to use HttpUtility.HtmlDecode? I want to avoid adding System.Web to my console application if possible.
The Html Agility Pack is equiped with a utility class called HtmlEntity
. It has a static method with the following signature:
/// <summary> /// Replace known entities by characters. /// </summary> /// <param name="text">The source text.</param> /// <returns>The result text.</returns> public static string DeEntitize(string text)
It supports well-known entities (like
) and encoded characters such as '
as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With