Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining text from HTML on WP7 using HtmlAgilityPack

I'm trying to extract text from HTML using HtmlAgilityPack. I successfully added HtmlAgilityPack to my project. However, I tried the following code to extract the body text:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// There are various options, set as needed
htmlDoc.OptionFixNestedTags=true;

// filePath is a path to a file containing the html
htmlDoc.Load(filePath);

// Use:  htmlDoc.LoadXML(xmlString);  to load from a string

// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors!=null && htmlDoc.ParseErrors.Count>0)
{
    // Handle any parse errors as required
}
else
{
    if (htmlDoc.DocumentNode != null)
    {
        HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");

        if (bodyNode != null)
        {
            // Do something with bodyNode
        }
    }
}

and I receive the following error when building the project.

Error 1 The type 'System.Xml.XPath.IXPathNavigable' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Xml.XPath, Version=2.0.5.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'. D:\test\test\MainPage.xaml.cs 58

I should add that I added the System.Xml reference and I still get this error. Can you please help me out what this issue? Thanks.

like image 582
Kartos Avatar asked Jun 05 '26 17:06

Kartos


2 Answers

Thanks. I figured out that I had to add a reference to the System.Xml.XPath from the Silverlight 4.0 folder available in the Microsoft SDKs parent folder.

like image 110
Kartos Avatar answered Jun 08 '26 05:06

Kartos


With HAP on the phone you'll have to use Linq2Xml to find stuff in the parsed HTML. And you might have to build the phone version from the source (HAPPhone).

public void Hap()
{
   HtmlWeb.LoadAsync("http://www.page.com", OnCallback);              
}



private void OnCallback(object s, HtmlDocumentLoadCompleted htmlDocumentLoadCompleted)
        {            
            var htmlDocument = htmlDocumentLoadCompleted.Document;

            var test = htmlDocument.DocumentNode.Descendants("select").ToList();


            var test2 = (from h in htmlDocument.DocumentNode.Descendants("select")
                         where h.Attributes["id"].Value == "stateDropdown"
                         select h).FirstOrDefault().ChildNodes.ToList();
        }
like image 36
Derek Beattie Avatar answered Jun 08 '26 05:06

Derek Beattie