Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting meta tag attribute with HTML Agility Pack using XPATH

META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
TITLE>Microsoft Corporation
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" />
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" />
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." />
META NAME="MS.LOCALE" CONTENT="EN-US" />
META NAME="CATEGORY" CONTENT="home page" />

I'd like to know what XPATH I would need to get the value of the Content attribute of the Category meta tag using HTML Agility Pack. (I removed the first < of each line in the html code so it would post).

like image 694
Eugene Avatar asked Jul 12 '10 20:07

Eugene


3 Answers

For a long time HtmlAgilityPack didn't have the ability to directly query an attribute value. You had to loop over the list of meta nodes. Here's one way:

var doc = new HtmlDocument();
doc.LoadHtml(htmlString);

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list)
{
    string content = node.GetAttributeValue("content", "");
}

But it looks like there is an experimental xpath release that will let you do that.

doc.DocumentNode.SelectNodes("//meta/@content") 

will return a list of HtmlAttribute objects.

like image 177
Rohit Agarwal Avatar answered Nov 15 '22 05:11

Rohit Agarwal


Thank you for the quick response Rohit Agarwal (I saw it answered only a few hours after I asked, but haven't been able to test it until today).

I did originally implement your suggestion as follows (it's in vb.net)

Dim result As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)

    Dim list = doc.DocumentNode.SelectNodes("//meta")
    Dim node As Object

    For Each node In list
        Dim metaname As String = node.GetAttributeValue("name", String.Empty)
        If metaname <> String.Empty Then
            If (metaname = "title") Then
                title = node.GetAttributeValue("content", String.Empty)
            //more elseif thens
            End if
        End if
    Next (node)

However, I've found that //meta[@name='title'] will give me the same result

Dim result As String = webClient.DownloadString(url)

Dim doc As New HtmlDocument() doc.LoadHtml(result)

title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)

Thanks for putting me on the right track=D

like image 24
Eugene Avatar answered Nov 15 '22 06:11

Eugene


If you just want the meta tag to display Title, description and keywords then use

 if (metaTags != null)
        {
            foreach (var tag in metaTags)
            {
                if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
                {
                       Panel divPage = new Panel();                        
                       divPage.InnerHtml = divPage.InnerHtml + "<br /> " +
                        "<b> Page " + tag.Attributes["name"].Value + " </b>: " +
                            tag.Attributes["content"].Value + "<br />";
                }
            }
        }

If you want to get the og:tags from the link add this code after that

            if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null))
            {
                if (tag.Attributes["property"].Value == "og:image")
                {
                    img.ImageUrl = tag.Attributes["content"].Value;
                }

            }

this is great experience... I like :) this code ever

like image 22
Sunil Acharya Avatar answered Nov 15 '22 07:11

Sunil Acharya