Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Html Agility Pack, search through site for a specified string of words

I'm using the Html Agility Pack for this task, basically I've got a URL, and my program should read through the content of the html page on it, and if it finds a line of text (ie: "John had three apples"), it should change a label's text to "Found it".

I tried to do it with contains, but I guess it only checks for one word.

var nodeBFT = doc.DocumentNode.SelectNodes("//*[contains(text(), 'John had three apples')]");

if (nodeBFT != null && nodeBFT.Count != 0)
    myLabel.Text = "Found it";

EDIT: Rest of my code, now with ako's attempt:

if (CheckIfValidUrl(v)) // foreach var v in a list..., checks if the URL works
{
    HtmlWeb hw = new HtmlWeb();
    HtmlDocument doc = hw.Load(v);

    try
    {
        if (doc.DocumentNode.InnerHtml.ToString().Contains("string of words"))
        {
            mylabel.Text = v;
        }
    ...
like image 587
hungariandude Avatar asked Nov 20 '15 19:11

hungariandude


2 Answers

One possible option is using . instead of text(). Passing text() to contains() function the way you did will, as you suspected, effective only when the searched text is the first direct child of the current element :

doc.DocumentNode.SelectNodes("//*[contains(., 'John had three apples')]");

In the other side, contains(., '...') evaluates the entire text content of current element, concatenated. So, just a heads up, the above XPath will also consider the following element for example, as a match :

<span>John had <br/>three <strong>apples</strong></span>

If you need the XPath to only consider cases when the entire keyword contained in a single text node, and therefore considers the above case as a no-match, you can try this way :

doc.DocumentNode.SelectNodes("//*[text()[contains(., 'John had three apples')]]");

If none of the above works for you, please post minimal HTML snippet that contains the keyword but returned no match, so we can examine further what possibly causes that behavior and how to fix it.

like image 53
har07 Avatar answered Oct 18 '22 20:10

har07


use this:

if (doc.DocumentNode.InnerHtml.ToString().Contains("John had three apples"))
    myLabel.Text="Found it";
like image 20
ako Avatar answered Oct 18 '22 22:10

ako