Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting cookies in scrapy Request

I am using scrapy + selenium, since the website I am scrapping needs javascript for authentication. I login with selenium and pass the cookies to the following request.

def login(self, response):
    driver = webdriver.Firefox()
    driver.get("http://www.site.com/login")
    driver.find_element_by_xpath("//input[@id='myname']").send_keys(settings['USERNAME'])
    driver.find_element_by_xpath("//input[@id='mypwd']").send_keys(settings['PASSWORD'])
    driver.find_element_by_xpath("//input[@name='Logon']").click()
    self.driver = driver
    return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)

So far so good, since cookies are sticky all the following requests work perfectly well. My scrapping is quite long, so at some point cookies expire, so I need to relogin. At this point I am passing a new Request with callback to login function. Here it fails since new cookies are merged with old ones. Is there a way to reset cookies?

ANSWER

@Drewness in his answer suggested to use dont_merge_cookies attribute in meta dictionary. It didn't work due to the following reason. According to the source code, the following Request:

Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, meta={'dont_merge_cookies' : True}, dont_filter=True)

should do nothing with cookies you pass to him.

In my solution I decided to skip dont_merge_cookies attribute and simply reset the response headers just before creating the request:

response.headers = {}
return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)
like image 541
user2016508 Avatar asked Nov 02 '22 04:11

user2016508


1 Answers

From the docs:

When some site returns cookies (in a response) those are stored in the cookies for that domain and will be sent again in future requests. That’s the typical behaviour of any regular web browser. However, if, for some reason, you want to avoid merging with existing cookies you can instruct Scrapy to do so.

Like so:

request_with_cookies = Request(url="http://www.example.com",
                               cookies={'currency': 'USD', 'country': 'UY'},
                               meta={'dont_merge_cookies': True})

dont_merge_cookies being the key here of course.

like image 83
Drewness Avatar answered Nov 09 '22 05:11

Drewness