Scraping ASP.NET site with Ruby

Question

I would like to scrape the search results of this ASP.NET site using Ruby and preferably just using Hpricot (I cannot open an instance of Firefox): http://www.ngosinfo.gov.pk/SearchResults.aspx?name=&foa=0

However, I am having trouble figuring out how to go through each page of results. Basically, I need simulate clicking on links like these:

<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$Pager1$2','')" class="blue_11" id="ctl00_ContentPlaceHolder1_Pager1">2</a>
<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$Pager1$3','')" class="blue_11" id="ctl00_ContentPlaceHolder1_Pager1">3</a>

etc.

I tried using Net::HTTP to handle the post, but while that received the correct HTML, there were no search results (I'm probably not doing that correctly). In addition, the URL of the page does not contain any parameters indicating page, so it is not possible to force the results that way.

Any help would be greatly appreciated.

jwilkins · Accepted Answer

Using mechanize-1.0.0 the following works:

 agent = Mechanize.new
 page = agent.get('http://127.0.0.1/some.aspx')

 form = page.form("aspnetForm")
 form.add_field!('__EVENTARGUMENT', 'Page$2')
 form.add_field!('__EVENTTARGET', 'ctl00$ContentPlaceHolder1$gvwSomeList')
 page = agent.submit(form) # this gets page 2

Scraping ASP.NET site with Ruby

Tags:

ruby

asp.net

screen-scraping

JillianK

1 Answers

jwilkins

Recent Activity

Donate For Us

Scraping ASP.NET site with Ruby

Tags:

ruby

asp.net

screen-scraping

JillianK

1 Answers

jwilkins

Related questions

Recent Activity

Donate For Us