Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Post request by using request payload in scrapy

Tags:

scrapy

How can I scrape this website? How would I send a post request using payload and get data from it?

If I use this code I am able to scrape first page but how would I scrape the second page? Do I need to use selenium or is scrapy enough for this?

import scrapy
from scrapy import log
from scrapy.http import *
import urllib2
class myntra_spider(scrapy.Spider):
    name="myntra"
    allowed_domain=[]
    start_urls=["http://www.myntra.com/men-footwear"]
    logfile=open('testlog.log','w')
    log_observer=log.ScrapyFileLogObserver(logfile,level=log.ERROR)
    log_observer.start()
    # sub_category=[]



    def parse(self,response):
        print "response url ",response.url

        link=response.xpath("//ul[@class='results small']/li/a/@href").extract()
        print links
        yield Request('http://www.myntra.com/search-service/searchservice/search/filteredSearch', callback=self.nextpages,body="")



    def nextpages(self,response):
        link=response.xpath("//ul[@class='results small']/li/a/@href").extract()
        for i in range(10):
            print "link ",link[i]
like image 344
Println Avatar asked Mar 20 '15 14:03

Println


People also ask

How do you get a response from Scrapy request?

Using FormRequest. You can use the FormRequest. from_response() method for this job. Here's an example spider which uses it: import scrapy def authentication_failed(response): # TODO: Check the contents of the response and return True if it failed # or False if it succeeded.

How do you use Scrapy requests?

Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.

How do I make a Scrapy request?

Making a request is a straightforward process in Scrapy. To generate a request, you need the URL of the webpage from which you want to extract useful data. You also need a callback function. The callback function is invoked when there is a response to the request.


1 Answers

You do not need Selenium for this. Check the payload required to be sent along with the request in your browser and attach it with the request.

I tried it with your site, the following snippet works -

def start_requests(self):
    url = "http://www.myntra.com/search-service/searchservice/search/filteredSearch"
    payload = [{
        "query": "(global_attr_age_group:(\"Adults-Unisex\" OR \"Adults-Women\") AND global_attr_master_category:(\"Footwear\"))",
        "start": 0,
        "rows": 96,
        "facetField": [],
        "pivotFacets": [],
        "fq": ["count_options_availbale:[1 TO *]"],
        "sort": [
            {"sort_field": "count_options_availbale", "order_by": "desc"},
            {"sort_field": "score", "order_by": "desc"},
            {"sort_field": "style_store1_female_sort_field", "order_by": "desc"},
            {"sort_field": "potential_revenue_female_sort_field", "order_by": "desc"},
            {"sort_field": "global_attr_catalog_add_date", "order_by": "desc"}
        ],
        "return_docs": True,
        "colour_grouping": True,
        "useCache": True,
        "flatshot": False,
        "outOfStock": False,
        "showInactiveStyles": False,
        "facet": True
    }]
    yield Request(url, self.parse, method="POST", body=json.dumps(payload))

def parse(self, response):
    data = json.loads(response.body)
    print data
like image 59
dizzy54 Avatar answered Sep 27 '22 22:09

dizzy54