Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to send post data in start_urls of the scrapy spider

I want to crawl a website which supports only post data. I want to send the query params in post data in all the requests. How to achieve this?

like image 455
nizam.sp Avatar asked Jul 12 '13 22:07

nizam.sp


People also ask

What is Start_urls in Scrapy?

start_urls contain those links from which the spider start crawling. If you want crawl recursively you should use crawlspider and define rules for that.

How do you pass meta in Scrapy?

Essentially, I had to connect to the database, get the url and product_id then scrape the URL while passing its product id. All these had to be done in start_requests because that is the function scrapy invokes to request urls. This function has to return a Request object.


1 Answers

POST requests can be made using scrapy's Request or FormRequest classes.

Also, consider using start_requests() method instead of start_urls property.

Example:

from scrapy.http import FormRequest

class myspiderSpider(Spider):
    name = "myspider"
    allowed_domains = ["www.example.com"]

    def start_requests(self):
        return [ FormRequest("http://www.example.com/login",
                     formdata={'someparam': 'foo', 'otherparam': 'bar'},
                     callback=self.parse) ]

Hope that helps.

like image 172
alecxe Avatar answered Oct 19 '22 06:10

alecxe