Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTTP POST and parsing JSON with Scrapy

I have a site that I want to extract data from. The data retrieval is very straight forward.

It takes the parameters using HTTP POST and returns a JSON object. So, I have a list of queries that I want to do and then repeat at certain intervals to update a database. Is scrapy suitable for this or should I be using something else?

I don't actually need to follow links but I do need to send multiple requests at the same time.

like image 394
Crypto Avatar asked Feb 15 '23 04:02

Crypto


1 Answers

How does looks like the POST request? There are many variations, like simple query parameters (?a=1&b=2), form-like payload (the body contains a=1&b=2), or any other kind of payload (the body contains a string in some format, like json or xml).

In scrapy is fairly straightforward to make POST requests, see: http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples

For example, you may need something like this:

    # Warning: take care of the undefined variables and modules!

    def start_requests(self):
        payload = {"a": 1, "b": 2}
        yield Request(url, self.parse_data, method="POST", body=urllib.urlencode(payload))

    def parse_data(self, response):
        # do stuff with data...
        data = json.loads(response.body)
like image 101
R. Max Avatar answered Feb 20 '23 10:02

R. Max