how to extract asin from an amazon product page

Question

I have the following webpage Product page and I'm trying to get the ASIN from it (in this case ASIN=B014MHZ90M) and I don't have a clue on how to get it from the page.

I'm using Python 3.4, Scrapy and the following code:

hxs = Selector(response)
product_name = "".join(hxs.xpath('//span[contains(@class,"a-text-ellipsis")]/a/text()').extract())
product_model = hxs.xpath('//body//div[@id="buybox_feature_div"]//form[@method="post"]/input[@id="ASIN"/text()').extract()

In this way I don't get the required field (the ASIN number).

What should I do in order to get the product model (ASIN)?

2.Is there a way to debug such code (I'm using PyCharm). I could not use debugger but only run it without seeing what's going on there in 'slow motion'.

Admin · Accepted Answer

you can extract B014MHZ90M from the response.url

response.url.split("/dp/")[1]

response.url.split("/dp/")[1] = B014MHZ90M

response.url.split("/dp/")[0] = http://www.amazon.com

how to extract asin from an amazon product page

Tags:

python

python-3.x

scrapy

web-crawler

Lior Magen

1 Answers

Recent Activity

Donate For Us

how to extract asin from an amazon product page

Tags:

python

python-3.x

scrapy

web-crawler

Lior Magen

1 Answers

Related questions

Recent Activity

Donate For Us