Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy authentication login with cookies

i am new to scrapy and decided to try it out because of good online reviews. I am trying to login to a website with scrapy. I have successfully logged in with a combination of selenium and mechanize by collecting the needed cookies with selenium and adding them to mechanize. Now I am trying to do something similar with scrapy and selenium but cant seem to get anything to work. I cant really even tell if anything is working or not. Can anyone please help me. Below is what Ive started on. I may not even need to transfer the cookies with scrapy but i cant tell if the thing ever actually logs in or not. Thanks

from scrapy.spider import BaseSpider
from scrapy.http import Response,FormRequest,Request
from scrapy.selector import HtmlXPathSelector
from selenium import webdriver

class MySpider(BaseSpider):
    name = 'MySpider'
    start_urls = ['http://my_domain.com/']

    def get_cookies(self):
        driver = webdriver.Firefox()
        driver.implicitly_wait(30)
        base_url = "http://www.my_domain.com/"
        driver.get(base_url)
        driver.find_element_by_name("USER").clear()
        driver.find_element_by_name("USER").send_keys("my_username")
        driver.find_element_by_name("PASSWORD").clear()
        driver.find_element_by_name("PASSWORD").send_keys("my_password")
        driver.find_element_by_name("submit").click()
        cookies = driver.get_cookies()
        driver.close()
        return cookies

    def parse(self, response,my_cookies=get_cookies):
        return Request(url="http://my_domain.com/",
            cookies=my_cookies,
            callback=self.login)

    def login(self,response):
        return [FormRequest.from_response(response,
            formname='login_form',
            formdata={'USER': 'my_username', 'PASSWORD': 'my_password'},
            callback=self.after_login)]

    def after_login(self, response):
        hxs = HtmlXPathSelector(response)
        print hxs.select('/html/head/title').extract()
like image 324
JonDog Avatar asked Jun 26 '12 04:06

JonDog


People also ask

How do I log into Scrapy?

Using Scrapy to handle token based authentication To do this we scroll to the network tab before login and then simulate a login procedure. All requests will appear down below. Selecting the login name on the left hand side allows us to see the request headers down below.

Is Scrapy better than selenium?

Selenium is an excellent automation tool and Scrapy is by far the most robust web scraping framework. When we consider web scraping, in terms of speed and efficiency Scrapy is a better choice. While dealing with JavaScript based websites where we need to make AJAX/PJAX requests, Selenium can work better.

What is Cookies in Scrapy?

Scrapy has a downloader middleware CookiesMiddleware implemented to support cookies. You just need to enable it. It mimics how the cookiejar in browser works.


1 Answers

Your question is more of debug issue, so my answer will have just some notes on your question, not the exact answer.

def parse(self, response,my_cookies=get_cookies):
    return Request(url="http://my_domain.com/",
        cookies=my_cookies,
        callback=self.login)

my_cookies=get_cookies - you are assigning a function here, not the result it returns. I think you don't need to pass any function here as parameter at all. It should be:

def parse(self, response):
    return Request(url="http://my_domain.com/",
        cookies=self.get_cookies(),
        callback=self.login)

cookies argument for Request should be a dict - please verify it is indeed a dict.

I cant really even tell if anything is working or not.

Put some prints in the callbacks to follow the execution.

like image 56
warvariuc Avatar answered Oct 02 '22 00:10

warvariuc