Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Send Post Request in Scrapy

I am trying to crawl the latest reviews from google play store and to get that I need to make a post request.

With the Postman, it works and I get desired response.

enter image description here

but a post request in terminal gives me a server error

For ex: this page https://play.google.com/store/apps/details?id=com.supercell.boombeach

curl -H "Content-Type: application/json" -X POST -d '{"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}' https://play.google.com/store/getreviews 

gives a server error and

Scrapy just ignores this line:

frmdata = {"id": "com.supercell.boombeach", "reviewType": 0, "reviewSortOrder": 0, "pageNum":0}         url = "https://play.google.com/store/getreviews"         yield Request(url, callback=self.parse, method="POST", body=urllib.urlencode(frmdata)) 
like image 798
Amit Tripathi Avatar asked May 20 '15 06:05

Amit Tripathi


People also ask

How do I pass parameters in Scrapy request?

It is an old topic, but for anyone who needs it, to pass an extra parameter you must use cb_kwargs , then call the parameter in the parse method. You can refer to this part of the documentation.

What does Scrapy request return?

Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.

How do you set a header in Scrapy?

You need to set the user agent which Scrapy allows you to do directly. import scrapy class QuotesSpider(scrapy. Spider): # ... user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.


2 Answers

The answer above do not really solved the problem. They are sending the data as paramters instead of JSON data as the body of the request.

From http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json:

my_data = {'field1': 'value1', 'field2': 'value2'} request = scrapy.Request( url, method='POST',                            body=json.dumps(my_data),                            headers={'Content-Type':'application/json'} ) 
like image 192
aitorhh Avatar answered Sep 20 '22 03:09

aitorhh


Make sure that each element in your formdata is of type string/unicode

frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'} url = "https://play.google.com/store/getreviews" yield FormRequest(url, callback=self.parse, formdata=frmdata) 

I think this will do

In [1]: from scrapy.http import FormRequest  In [2]: frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}  In [3]: url = "https://play.google.com/store/getreviews"  In [4]: r = FormRequest(url, formdata=frmdata)  In [5]: fetch(r)  2015-05-20 14:40:09+0530 [default] DEBUG: Crawled (200) <POST      https://play.google.com/store/getreviews> (referer: None) [s] Available Scrapy objects: [s]   crawler    <scrapy.crawler.Crawler object at 0x7f3ea4258890> [s]   item       {} [s]   r          <POST https://play.google.com/store/getreviews> [s]   request    <POST https://play.google.com/store/getreviews> [s]   response   <200 https://play.google.com/store/getreviews> [s]   settings   <scrapy.settings.Settings object at 0x7f3eaa205450> [s]   spider     <Spider 'default' at 0x7f3ea3449cd0> [s] Useful shortcuts: [s]   shelp()           Shell help (print this help) [s]   fetch(req_or_url) Fetch request (or URL) and update local objects [s]   view(response)    View response in a browser 
like image 29
Jithin Avatar answered Sep 20 '22 03:09

Jithin