I'm trying to use proxy (proxymesh) alongside scrapy-splash. I have following (relevant) code
PROXY = """splash:on_request(function(request)
request:set_proxy{
host = http://us-ny.proxymesh.com,
port = 31280,
username = username,
password = secretpass,
}
return splash:html()
end)"""
and in start_requests
def start_requests(self):
for url in self.start_urls:
print url
yield SplashRequest(url, self.parse,
endpoint='execute',
args={'wait': 5,
'lua_source': PROXY,
'js_source': 'document.body'},
But it does not seem to work. self.parse is not called at all. If I change endpoint to 'render.html' I hit the self.parse method, but when I inspect headers (response.headers) I can see that it is not going trough proxy. I confirmed that when I set http://checkip.dyndns.org/ as starting url and saw, upon parsing response, my old ip address.
What am I doing wrong?
You should add 'proxy' argument to SplashRequest object.
def start_requests(self):
for url in self.start_urls:
print url
yield SplashRequest(url, self.parse,
endpoint='execute',
args={'wait': 5,
'lua_source': PROXY,
'js_source': 'document.body',
'proxy': 'http://proxy_ip:proxy_port'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With