Scrapy response incomplete

Question

I tried to crawl the following URL using Scrapy: http://www.walgreens.com/search/results.jsp?Ntt=bounty+paper+towel

but the returned URL is not complete. Because when I do

scrapy shell the_url_above

then

view(response)

The webpage just doesn't load completely. So my question is:

what is the cause of this problem? (why I didn't get a 404 but a incomplete response)
what are some potential ways to handle it?

Granitosaurus · Accepted Answer

The data for that page seems to be loaded in with javascript. If you inspect the page (e.g. firebug network tab) you'll see that once the base page is loaded the products are being loaded in by javascript which sends a POST request to http://www.walgreens.com/svc/products/search with contents:

{"p":"1",  # seems to be page number
"s":"15",  # page size
"sort":"relevance",
"view":"allView",
"geoTargetEnabled":false,
"q":"bounty paper towel",  # search query
"requestType":"search",
"deviceType":"desktop"}

You can just send this request using scrapy as:

yield Request('http://www.walgreens.com/svc/products/search',
              method='POST',
              body=<the json from above>)

And you should receive a json object full of product data.

You can actually even view the response in the browser via this link: http://www.walgreens.com/svc/products/search?p=1&s=15&sort=relevance&view=allView&geoTargetEnabled=false&q=bounty%20paper%20towel&requestType=search&deviceType=desktop

Scrapy response incomplete

Tags:

python

web-scraping

scrapy

web-crawler

user2628641

1 Answers

Granitosaurus

Recent Activity

Donate For Us

Scrapy response incomplete

Tags:

python

web-scraping

scrapy

web-crawler

user2628641

1 Answers

Granitosaurus

Related questions

Recent Activity

Donate For Us