Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HtmlAgilityPack select only inner text Node

Tags:

html

c#

linq

This is my sample html input part of bigger html file.

string html = "<html> <p>Ingredients:</p> </html>";

I want to retrieve only node having inner text Ingredients. Ingredients may come in html node p, div, strong etc.

My c# code to achieve this using HtmlAgility pack and linq is

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

List<HtmlNode> ingredientList = doc.DocumentNode.Descendants().Where
                        (x => x.InnerText.Contains("Ingredients:")).ToList();

Result of this code gives me 3 nodes

<html> node
<p> node
#text node

I want retrieve only

<p> node
like image 778
jayawant.karale Avatar asked Oct 29 '25 23:10

jayawant.karale


1 Answers

If your platform support XPath i.e HtmlAgilityPack's SelectNodes() method is available, you can use XPath expression to get element where one of its direct-child text node contains the keyword :

List<HtmlNode> ingredientList = doc.DocumentNode
                                   .SelectNodes("//*[text()[contains(.,'Ingredients:')]]")
                                   .ToList();
like image 75
har07 Avatar answered Oct 31 '25 13:10

har07



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!