I tried to crawl the following URL using Scrapy: http://www.walgreens.com/search/results.jsp?Ntt=bounty+paper+towel
but the returned URL is not complete. Because when I do
scrapy shell the_url_above
then
view(response)
The webpage just doesn't load completely. So my question is:
The data for that page seems to be loaded in with javascript. If you inspect the page (e.g. firebug network tab) you'll see that once the base page is loaded the products are being loaded in by javascript which sends a POST request to http://www.walgreens.com/svc/products/search with contents:
{"p":"1", # seems to be page number
"s":"15", # page size
"sort":"relevance",
"view":"allView",
"geoTargetEnabled":false,
"q":"bounty paper towel", # search query
"requestType":"search",
"deviceType":"desktop"}
You can just send this request using scrapy as:
yield Request('http://www.walgreens.com/svc/products/search',
method='POST',
body=<the json from above>)
And you should receive a json object full of product data.
You can actually even view the response in the browser via this link: http://www.walgreens.com/svc/products/search?p=1&s=15&sort=relevance&view=allView&geoTargetEnabled=false&q=bounty%20paper%20towel&requestType=search&deviceType=desktop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With