i am trying to make web crawler using ABOT in c#.i have searched many example and added the ABOT web crawler. from that i can get only log output instead of Html page output.i want to get html page output only.because that HTML output is input for HTML Agility Tool. Help me to get HTML output from ABOT web crawler in C#. Thanks.
Explained here on the quickstart page
//Create an instance of the crawler and subscribe to the PageCrawlCompleted event
PoliteWebCrawler crawler = new PoliteWebCrawler();
crawler.PageCrawlCompleted += crawler_ProcessPageCrawlCompleted;
//The event handler method
void crawler_ProcessPageCrawlCompleted(object sender, PageCrawlCompletedArgs e)
{
CrawledPage crawledPage = e.CrawledPage;
if (crawledPage.WebException != null || crawledPage.HttpWebResponse.StatusCode != HttpStatusCode.OK)
Console.WriteLine("Crawl of page failed {0}", crawledPage.Uri.AbsoluteUri);
else
Console.WriteLine("Crawl of page succeeded {0}", crawledPage.Uri.AbsoluteUri);
//crawledPage.Content.Text //raw html
//crawledPage.HtmlDocument //lazy loaded html agility pack object (HtmlAgilityPack.HtmlDocument)
//crawledPage.CSDocument //lazy loaded cs query object (CsQuery.Cq)
}
void crawler_ProcessPageCrawlCompleted(object sender, PageCrawlCompletedArgs e)
{
CrawledPage crawledPage = e.CrawledPage;
crawledPage.Content.Text // HTML
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With