Access session cookie in scrapy spiders

Tags:

I am trying to access the session cookie within a spider. I first login to a social network using in a spider:

    def parse(self, response):

        return [FormRequest.from_response(response,
                formname='login_form',
                formdata={'email': '...', 'pass':'...'},
                callback=self.after_login)]

In after_login, I would like to access the session cookies, in order to pass them to another module (selenium here) to further process the page with an authentificated session.

I would like something like that:

     def after_login(self, response):

        # process response
        .....

        # access the cookies of that session to access another URL in the
        # same domain with the autehnticated session.
        # Something like:
        session_cookies = XXX.get_session_cookies()
        data = another_function(url,cookies)

Unfortunately, response.cookies does not return the session cookies.

How can I get the session cookies ? I was looking at the cookies middleware: scrapy.contrib.downloadermiddleware.cookies and scrapy.http.cookies but there doesn't seem to be any straightforward way to access the session cookies.

Some more details here bout my original question:

Unfortunately, I used your idea but I dind't see the cookies, although I know for sure that they exists since the scrapy.contrib.downloadermiddleware.cookies middleware does print out the cookies! These are exactly the cookies that I want to grab.

So here is what I am doing:

The after_login(self,response) method receives the response variable after proper authentication, and then I access an URL with the session data:

  def after_login(self, response):

        # testing to see if I can get the session cookies
        cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
        cookieJar.extract_cookies(response, response.request)
        cookies_test = cookieJar._cookies
        print "cookies - test:",cookies_test

        # URL access with authenticated session
        url = "http://site.org/?id=XXXX"     
        request = Request(url=url,callback=self.get_pict)   
        return [request]

As the output below shows, there are indeed cookies, but I fail to capture them with cookieJar:

cookies - test: {}
2012-01-02 22:44:39-0800 [myspider] DEBUG: Sending cookies to: <GET http://www.facebook.com/profile.php?id=529907453>
    Cookie: xxx=3..........; yyy=34.............; zzz=.................; uuu=44..........

So I would like to get a dictionary containing the keys xxx, yyy etc with the corresponding values.

Thanks :)

522

asked Jan 03 '12 05:01

mikolune

2 Answers

A classic example is having a login server, which provides a new session id after a successful login. This new session id should be used with another request.

Here is the code picked up from source which seems to work for me.

print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]

Code:

def check_logged(self, response):
tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
cookieHolder=dict(SESSION_ID=tmpCookie)

#print response.body
if "my name" in response.body:
    yield Request(url="<<new url for another server>>",   
        cookies=cookieHolder,
        callback=self."<<another function here>>")
else:
    print "login failed"
        return

164

answered Sep 27 '22 19:09

Ravi Ramadoss

Maybe this is an overkill, but i don't know how are you going to use those cookies, so it might be useful (an excerpt from real code - adapt it to your case):

from scrapy.http.cookies import CookieJar

class MySpider(BaseSpider):

    def parse(self, response):

        cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
        cookieJar.extract_cookies(response, response.request)
        request = Request(nextPageLink, callback = self.parse2,
                      meta = {'dont_merge_cookies': True, 'cookie_jar': cookieJar})
        cookieJar.add_cookie_header(request) # apply Set-Cookie ourselves

CookieJar has some useful methods.

If you still don't see the cookies - maybe they are not there?

UPDATE:

Looking at CookiesMiddleware code:

class CookiesMiddleware(object):
    def _debug_cookie(self, request, spider):
        if self.debug:
            cl = request.headers.getlist('Cookie')
            if cl:
                msg = "Sending cookies to: %s" % request + os.linesep
                msg += os.linesep.join("Cookie: %s" % c for c in cl)
                log.msg(msg, spider=spider, level=log.DEBUG)

So, try request.headers.getlist('Cookie')

answered Sep 27 '22 18:09

warvariuc

Related questions
                            
                                How can I share a session across multiple subdomains in ASP.NET?
                            
                                Is my understanding of PHP sessions correct?
                            
                                Java:Why http session is not destroyed when tab or browser is closed?
                            
                                Failed to start the session: already started by PHP ($_SESSION is set). 500 Internal Server Error - RuntimeException
                            
                                Web Api 2 Session
                            
                                How to check if SQL Server Agent is running
                            
                                Grails get Session and Management in Service class
                            
                                Could not find devise mapping for path "/sessions/user" devise log in error
                            
                                session_start(): ps_files_cleanup_dir: opendir(/var/lib/php5) failed: Permission denied [duplicate]
                            
                                CakeSession::_startSession - Slow on Elasticache
                            
                                Sharing session across rails apps on different subdomains
                            
                                Upgrading to Laravel 5.2 invalidates all sessions
                            
                                Retrieving website HTML page using cURL with current session and cookie data on secured page
                            
                                Does Flash Player transmit session cookies automatically?
                            
                                Load balancer and session management
                            
                                When's the earliest i can access some Session data in global.asax?
                            
                                PHP Login system using Cookies and Salted Hashes
                            
                                how can i set session in setup when i test phoenix action which need user_id in session?
                            
                                Handling 'session expired' in JSF web application, running in JBoss AS 5 [duplicate]
                            
                                Checking for PHP session without starting one?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Access session cookie in scrapy spiders

Tags:

cookies

session

session-cookies

scrapy

mikolune

People also ask

2 Answers

Ravi Ramadoss

warvariuc

Recent Activity

Donate For Us