Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to loop through all nodes without specifying node names

I am trying to get all the innerText from all the htmlDocument node from any html document.

I been going doing some research but haven't found a solution to how I will be able to go through all the parent and child node in the entire document without have to specify the node name.

I want to do this because I will be working with different html document so specifying the node name will not be an option for me at this point.

like image 995
Photonic Avatar asked Jan 07 '23 10:01

Photonic


1 Answers

I figured it out now... omg it was so simple to begin with as i didnt know the how to use these function

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load(MyIO.bingPathToAppDir("Test data/testHTML.html"));
HtmlNode j = htmlDoc.DocumentNode;
foreach (HtmlNode node in j.ChildNodes)
{
    checkNode(node);
}

static void checkNode(HtmlNode node)
{
    foreach (HtmlNode n in node.ChildNodes)
    {
        if (n.HasChildNodes)
        {
            checkNode(n);
        }
        else
        {
            Console.WriteLine(n.InnerText);
        }
    }
}
like image 78
Photonic Avatar answered Feb 01 '23 10:02

Photonic