Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use proxy or Tor within Heroku rails app to hide IP

I'm using Mechanize inside a rake task that is run by a scheduler add-on to my ruby app on Heroku. In the script, I am logging into a webpage which worked until recently when the script could no longer log-in. When I began debugging, Mechanize shows different form fields when I run the script in the heroku console than on my local console.

Local ruby console shows these fields:

>> asf.fields.each do |f| puts f.name end
__VIEWSTATE
__PREVIOUSPAGE
__EVENTVALIDATION
login$field
password$field

Heroku console shows one additional field that does NOT appear in the html source:

>> asf.fields.each do |f| puts f.name end
__VIEWSTATE 
__PREVIOUSPAGE
__EVENTVALIDATION
login$field
password$field
captcha$txtCaptcha

When I issue:

>> asf.click_button

Update: I tried changing the user agent to several different browser aliases with no luck. It appears that the IP address from Heroku is causing the captcha to be served up. Would it be possible to make a request through a proxy server or use Tor to keep the IP from being exposed?

like image 376
samfu_1 Avatar asked May 21 '12 19:05

samfu_1


1 Answers

Answer to your question is yes, you can proxy through tor. I've done it in the past, issues you will face:

  1. You'll have to run tor somewhere else if your running on heroku

  2. Tor is pretty slow for scraping

  3. You'll need to setup a proxy that can speak to tor (privoxy)

  4. For any serious scraping you'll need to have multiple tors running

  5. Even your tor ips will get blocked after a while.

Makes you think if it's worth the hassle. You can pay for ip masking proxy services which might be an easier way to go.

Think link got me some of the way when I was looking into this: http://www.howtoforge.com/ultimate-security-proxy-with-tor

like image 153
MatthewFord Avatar answered Sep 24 '22 02:09

MatthewFord