Right now I'm running a scraping program on my computer. It's massive in size, and unfortunately because of this, my IP address has been banned from the site I need to scrape. Is there a way that in Ruby, or even just in a simple manner, I can switch my IP address so that I can be allowed back into this site for scraping, or am I out of luck, and I may have to resort to other solutions. It is a 403 Forbidden Error, and for whatever its worth I'm using nokogiri and my user agent is ruby, thanks.
You can connect through a proxy, and if you have a list of proxy addresses then you can tell ruby to change proxy every x minutes, this will result in a change of the IP that the website thinks you have. Here's a code to scrape google search results through a proxy, to use a proxy list just extend the code a bit.
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.set_proxy '78.186.178.153', 8080
page = agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = 'new york city council'
page = agent.submit(google_form, google_form.buttons.first)
page.links.each do |link|
if link.href.to_s =~/url.q/
str=link.href.to_s
strList=str.split(%r{=|&})
url=strList[1]
puts url
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With