Persist authenticated session between crawls for development in Scrapy

Question

I'm using a Scrapy spider that authenticates with a login form upon launching. It then scrapes with this authenticated session.

During development I usually run the spider many times to test it out. Authenticating at the beginning of each run spams the login form of the website. The website will often force a password reset in response and I suspect it will ban the account if this continues.

Because the cookies last a number of hours, there's no good reason to log in this often during development. To get around the password reset problem, what would be the best way to re-use an authenticated session/cookies between runs while developing? Ideally the spider would only attempt to authenticate if the persisted session has expired.

Edit:

My structure is like:

def start_requests(self):
        yield scrapy.Request(self.base, callback=self.log_in)

def log_in(self, response):
        #response.headers includes 'Set-Cookie': 'JSESSIONID=xx'; Path=/cas/; Secure; HttpOnly'
        yield scrapy.FormRequest.from_response(response,
                                        formdata={'username': 'xxx',
                                                     'password':''},
                                          callback=self.logged_in)
def logged_in(self, response):
        #request.headers and subsequent requests all have headers fields 'Cookie': 'JSESSIONID=xxx';
        #response.headers has no mention of cookies
        #request.cookies is empty

When I run the same page request in Chrome, under the 'Cookies' tab there are ~20 fields listed.

The documentation seems thin here. I've tried setting a field 'Cookie': 'JSESSIONID=xxx' on the headers dict of all outgoing requests based on the values returned by a successful login, but this bounces back to the login screen

Regan · Accepted Answer

Turns out that for an ad-hoc development solution, this is easier to do than I thought. Get the cookie string with cookieString = request.headers['Cookie'], save, then on subsequent outgoing requests load it up and do:

request.headers.appendlist('Cookie', cookieString)

Persist authenticated session between crawls for development in Scrapy

Tags:

python

cookies

scrapy

Regan

1 Answers

Regan

Recent Activity

Donate For Us

Persist authenticated session between crawls for development in Scrapy

Tags:

python

cookies

scrapy

Regan

1 Answers

Regan

Related questions

Recent Activity

Donate For Us