Post request by using request payload in scrapy

Tags:

scrapy

How can I scrape this website? How would I send a post request using payload and get data from it?

If I use this code I am able to scrape first page but how would I scrape the second page? Do I need to use selenium or is scrapy enough for this?

import scrapy
from scrapy import log
from scrapy.http import *
import urllib2
class myntra_spider(scrapy.Spider):
    name="myntra"
    allowed_domain=[]
    start_urls=["http://www.myntra.com/men-footwear"]
    logfile=open('testlog.log','w')
    log_observer=log.ScrapyFileLogObserver(logfile,level=log.ERROR)
    log_observer.start()
    # sub_category=[]



    def parse(self,response):
        print "response url ",response.url

        link=response.xpath("//ul[@class='results small']/li/a/@href").extract()
        print links
        yield Request('http://www.myntra.com/search-service/searchservice/search/filteredSearch', callback=self.nextpages,body="")



    def nextpages(self,response):
        link=response.xpath("//ul[@class='results small']/li/a/@href").extract()
        for i in range(10):
            print "link ",link[i]

344

asked Mar 20 '15 14:03

Println

1 Answers

You do not need Selenium for this. Check the payload required to be sent along with the request in your browser and attach it with the request.

I tried it with your site, the following snippet works -

def start_requests(self):
    url = "http://www.myntra.com/search-service/searchservice/search/filteredSearch"
    payload = [{
        "query": "(global_attr_age_group:(\"Adults-Unisex\" OR \"Adults-Women\") AND global_attr_master_category:(\"Footwear\"))",
        "start": 0,
        "rows": 96,
        "facetField": [],
        "pivotFacets": [],
        "fq": ["count_options_availbale:[1 TO *]"],
        "sort": [
            {"sort_field": "count_options_availbale", "order_by": "desc"},
            {"sort_field": "score", "order_by": "desc"},
            {"sort_field": "style_store1_female_sort_field", "order_by": "desc"},
            {"sort_field": "potential_revenue_female_sort_field", "order_by": "desc"},
            {"sort_field": "global_attr_catalog_add_date", "order_by": "desc"}
        ],
        "return_docs": True,
        "colour_grouping": True,
        "useCache": True,
        "flatshot": False,
        "outOfStock": False,
        "showInactiveStyles": False,
        "facet": True
    }]
    yield Request(url, self.parse, method="POST", body=json.dumps(payload))

def parse(self, response):
    data = json.loads(response.body)
    print data

answered Sep 27 '22 22:09

dizzy54

Related questions
                            
                                Why are my input/output processors in Scrapy not working?
                            
                                How to use Scrapy with both Splash and Tor over Privoxy in Docker Compose
                            
                                what is the best way to scrape multiple domains with scrapy?
                            
                                python's scrapy doesn't seem to get data from all available URLs
                            
                                Scraping with Scrapy and Selenium
                            
                                Running a scrapy spider in the background in a Flask app
                            
                                How can I make start_url in scrapy to consume from a message queue?
                            
                                How to work with RobotsTxtMiddleware in Scrapy framework?
                            
                                Scrapy: Wait for a specific url to be parsed before parsing others
                            
                                Integrating Selenium with Scrapy
                            
                                How to pass parameters to scrapy crawler from scrapyd?
                            
                                Run Multiple Spider sequentially
                            
                                How to run scrapy spider with arguments inside django view
                            
                                Splash lua script to do multiple clicks and visits
                            
                                Scraping Infinite Scrolling Pages with "load more" button using Scrapy
                            
                                Raise close spider from Scrapy pipeline
                            
                                Unable to use proxies one by one until there is a valid response
                            
                                Recording the total time taken for running a spider in scrapy
                            
                                How to call particular Scrapy spiders from another Python script
                            
                                Scrapy Spider: Restart spider when finishes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With