How to submit a form in scrapy?

Tags:

I tried to use scrapy to complete the login and collect my project commit count. And here is the code.

from scrapy.item import Item, Field
from scrapy.http import FormRequest
from scrapy.spider import Spider
from scrapy.utils.response import open_in_browser


class GitSpider(Spider):
    name = "github"
    allowed_domains = ["github.com"]
    start_urls = ["https://www.github.com/login"]

    def parse(self, response):
        formdata = {'login': 'username',
                'password': 'password' }
        yield FormRequest.from_response(response,
                                        formdata=formdata,
                                        clickdata={'name': 'commit'},
                                        callback=self.parse1)

    def parse1(self, response):
        open_in_browser(response)

After running the code

scrapy runspider github.py

It should show me the result page of the form, which should be a failed login in the same page as the username and password is fake. However it shows me the search page. The log file is located in pastebin

How should the code be fixed? Thanks in advance.

557

asked Jan 20 '15 06:01

Winston

2 Answers

Your problem is that FormRequest.from_response() uses a different form - a "search form". But, you wanted it to use a "log in form" instead. Provide a formnumber argument:

yield FormRequest.from_response(response,
                                formnumber=1,
                                formdata=formdata,
                                clickdata={'name': 'commit'},
                                callback=self.parse1)

Here is what I see opened in the browser after applying the change (used "fake" user):

enter image description here

188

answered Dec 08 '22 00:12

alecxe

Solution using webdriver.

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from scrapy.contrib.spiders import CrawlSpider

class GitSpider(CrawlSpider):

    name = "gitscrape"
    allowed_domains = ["github.com"]
    start_urls = ["https://www.github.com/login"]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        login_form = self.driver.find_element_by_name('login')
        password_form = self.driver.find_element_by_name('password')
        commit = self.driver.find_element_by_name('commit')
        login_form.send_keys("yourlogin")
        password_form.send_keys("yourpassword")
        actions = ActionChains(self.driver)
        actions.click(commit)
        actions.perform()
        # by this point you are logged to github and have access 
        #to all data in the main menù
        time.sleep(3)
        self.driver.close()

answered Dec 08 '22 01:12

aberna

Related questions
                            
                                How can I use a Jinja2 template inside a Python program?
                            
                                Python not finding elasticsearch package
                            
                                Convert String to Int without int()
                            
                                Python Multiprocessing: Crash in subprocess?
                            
                                Why isn't fromfile-prefix-chars in Python argparse working?
                            
                                How to plot a PMF of a sample?
                            
                                Pandas: fastest way to check if words in Series A endswith one word of Series B
                            
                                Create and set an element of a Pandas DataFrame to a list
                            
                                How to exit a script in Spyder?
                            
                                python -v prints out garbage [closed]
                            
                                How to ensure that a python function generates its output based only on its input?
                            
                                Beta Binomial Function in Python
                            
                                Python requests remove the Content-Length header from POST
                            
                                matlab isempty() function in numpy?
                            
                                Python PIL/Pillow - Pad image to desired size (eg. A4)
                            
                                How to read a gzip netcdf file in python?
                            
                                How can I print the type of a PyObject in an error message for an embedded Python script?
                            
                                How do I deploy a Python application to Amazon Elastic Beanstalk from Jenkins?
                            
                                Python - dictionary of lists
                            
                                What to choose to begin with ComputerVision: Scikit-image or OpenCV? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to submit a form in scrapy?

Tags:

python

forms

web-scraping

scrapy

Winston

People also ask

2 Answers

alecxe

aberna

Recent Activity

Donate For Us