Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get html output page in ABOT C# Web Crawler?

Tags:

c#

web-crawler

i am trying to make web crawler using ABOT in c#.i have searched many example and added the ABOT web crawler. from that i can get only log output instead of Html page output.i want to get html page output only.because that HTML output is input for HTML Agility Tool. Help me to get HTML output from ABOT web crawler in C#. Thanks.

like image 701
user2773170 Avatar asked Sep 12 '13 15:09

user2773170


2 Answers

Explained here on the quickstart page

//Create an instance of the crawler and subscribe to the PageCrawlCompleted event
PoliteWebCrawler crawler = new PoliteWebCrawler();
crawler.PageCrawlCompleted += crawler_ProcessPageCrawlCompleted;

//The event handler method
void crawler_ProcessPageCrawlCompleted(object sender, PageCrawlCompletedArgs e)
{
    CrawledPage crawledPage = e.CrawledPage;

    if (crawledPage.WebException != null || crawledPage.HttpWebResponse.StatusCode != HttpStatusCode.OK)
        Console.WriteLine("Crawl of page failed {0}", crawledPage.Uri.AbsoluteUri);
    else
        Console.WriteLine("Crawl of page succeeded {0}", crawledPage.Uri.AbsoluteUri);


    //crawledPage.Content.Text //raw html
    //crawledPage.HtmlDocument //lazy loaded html agility pack object (HtmlAgilityPack.HtmlDocument)
    //crawledPage.CSDocument   //lazy loaded cs query object (CsQuery.Cq)
}
like image 76
sjdirect Avatar answered Oct 29 '22 18:10

sjdirect


void crawler_ProcessPageCrawlCompleted(object sender, PageCrawlCompletedArgs e)
{
    CrawledPage crawledPage = e.CrawledPage;
    crawledPage.Content.Text // HTML

}
like image 24
alansiqueira27 Avatar answered Oct 29 '22 16:10

alansiqueira27