I am using scrapy + selenium, since the website I am scrapping needs javascript for authentication. I login with selenium and pass the cookies to the following request.
def login(self, response):
driver = webdriver.Firefox()
driver.get("http://www.site.com/login")
driver.find_element_by_xpath("//input[@id='myname']").send_keys(settings['USERNAME'])
driver.find_element_by_xpath("//input[@id='mypwd']").send_keys(settings['PASSWORD'])
driver.find_element_by_xpath("//input[@name='Logon']").click()
self.driver = driver
return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)
So far so good, since cookies are sticky all the following requests work perfectly well. My scrapping is quite long, so at some point cookies expire, so I need to relogin. At this point I am passing a new Request with callback to login function. Here it fails since new cookies are merged with old ones. Is there a way to reset cookies?
ANSWER
@Drewness in his answer suggested to use dont_merge_cookies
attribute in meta dictionary. It didn't work due to the following reason. According to the source code, the following Request:
Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, meta={'dont_merge_cookies' : True}, dont_filter=True)
should do nothing with cookies you pass to him.
In my solution I decided to skip dont_merge_cookies
attribute and simply reset the response headers just before creating the request:
response.headers = {}
return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)
From the docs:
When some site returns cookies (in a response) those are stored in the cookies for that domain and will be sent again in future requests. That’s the typical behaviour of any regular web browser. However, if, for some reason, you want to avoid merging with existing cookies you can instruct Scrapy to do so.
Like so:
request_with_cookies = Request(url="http://www.example.com",
cookies={'currency': 'USD', 'country': 'UY'},
meta={'dont_merge_cookies': True})
dont_merge_cookies
being the key here of course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With