I have a site that I want to extract data from. The data retrieval is very straight forward.
It takes the parameters using HTTP POST and returns a JSON object. So, I have a list of queries that I want to do and then repeat at certain intervals to update a database. Is scrapy suitable for this or should I be using something else?
I don't actually need to follow links but I do need to send multiple requests at the same time.
How does looks like the POST request? There are many variations, like simple query parameters (?a=1&b=2
), form-like payload (the body contains a=1&b=2
), or any other kind of payload (the body contains a string in some format, like json or xml).
In scrapy is fairly straightforward to make POST requests, see: http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples
For example, you may need something like this:
# Warning: take care of the undefined variables and modules!
def start_requests(self):
payload = {"a": 1, "b": 2}
yield Request(url, self.parse_data, method="POST", body=urllib.urlencode(payload))
def parse_data(self, response):
# do stuff with data...
data = json.loads(response.body)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With