I want to crawl a website which supports only post data. I want to send the query params in post data in all the requests. How to achieve this?
start_urls contain those links from which the spider start crawling. If you want crawl recursively you should use crawlspider and define rules for that.
Essentially, I had to connect to the database, get the url and product_id then scrape the URL while passing its product id. All these had to be done in start_requests because that is the function scrapy invokes to request urls. This function has to return a Request object.
POST requests can be made using scrapy's Request or FormRequest classes.
Also, consider using start_requests()
method instead of start_urls
property.
Example:
from scrapy.http import FormRequest
class myspiderSpider(Spider):
name = "myspider"
allowed_domains = ["www.example.com"]
def start_requests(self):
return [ FormRequest("http://www.example.com/login",
formdata={'someparam': 'foo', 'otherparam': 'bar'},
callback=self.parse) ]
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With