Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy response incomplete

I tried to crawl the following URL using Scrapy: http://www.walgreens.com/search/results.jsp?Ntt=bounty+paper+towel

but the returned URL is not complete. Because when I do

scrapy shell the_url_above

then

view(response)

The webpage just doesn't load completely. So my question is:

  1. what is the cause of this problem? (why I didn't get a 404 but a incomplete response)
  2. what are some potential ways to handle it?
like image 588
user2628641 Avatar asked Apr 18 '26 09:04

user2628641


1 Answers

The data for that page seems to be loaded in with javascript. If you inspect the page (e.g. firebug network tab) you'll see that once the base page is loaded the products are being loaded in by javascript which sends a POST request to http://www.walgreens.com/svc/products/search with contents:

{"p":"1",  # seems to be page number
"s":"15",  # page size
"sort":"relevance",
"view":"allView",
"geoTargetEnabled":false,
"q":"bounty paper towel",  # search query
"requestType":"search",
"deviceType":"desktop"}

You can just send this request using scrapy as:

yield Request('http://www.walgreens.com/svc/products/search',
              method='POST',
              body=<the json from above>)

And you should receive a json object full of product data.

You can actually even view the response in the browser via this link: http://www.walgreens.com/svc/products/search?p=1&s=15&sort=relevance&view=allView&geoTargetEnabled=false&q=bounty%20paper%20towel&requestType=search&deviceType=desktop

like image 96
Granitosaurus Avatar answered Apr 19 '26 23:04

Granitosaurus



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!