Get All Reviews From Amazon? Python 3

Question

I am trying to read all the reviews of a product from python. I have a script, but it does not work.

parser = html.fromstring(page_response)
XPATH_AGGREGATE = '//span[@id="acrCustomerReviewText"]'
XPATH_REVIEW_SECTION_1 = '//div[@data-hook="reviews-content"]'
XPATH_REVIEW_SECTION_2 = '//div[@data-hook="review"]'

XPATH_AGGREGATE_RATING = '//table[@id="histogramTable"]//tr'
XPATH_PRODUCT_NAME = '//h1//span[@id="productTitle"]//text()'
XPATH_PRODUCT_PRICE  = '//span[@id="priceblock_ourprice"]/text()'

raw_product_price = parser.xpath(XPATH_PRODUCT_PRICE)
product_price = ''.join(raw_product_price).replace(',','')

raw_product_name = parser.xpath(XPATH_PRODUCT_NAME)
product_name = ''.join(raw_product_name).strip()
total_ratings  = parser.xpath(XPATH_AGGREGATE_RATING)
reviews = parser.xpath(XPATH_REVIEW_SECTION_1)
if not reviews:
    reviews = parser.xpath(XPATH_REVIEW_SECTION_2)

The page is https://www.amazon.com/productreviews/'+asin+"/, where asin is an ID (eg, B0718Y23CQ). I get nothing in reviews. Thanks for any help!

Alex · Accepted Answer

Well, if I have to be honest, I don't know where are some of the paths that you use, because I can't find them. I have redone your code to try to help:

from lxml import html 
import requests
import json
asin = 'B0718Y23CQ'
page_response = requests.get('https://www.amazon.com/product-reviews/'+ asin)
parser = html.fromstring(page_response.content)
reviews_html = parser.xpath('//div[@class="a-section review"]')
reviews_arr = []
for review in reviews_html:
    review_dic = {}
    review_dic['title'] = review.xpath('.//a[@data-hook="review-title"]/text()')
    review_dic['rating'] = review.xpath('.//a[@class="a-link-normal"]/@title')
    review_dic['author'] = review.xpath('.//a[@data-hook="review-author"]/text()')
    review_dic['date'] = review.xpath('.//span[@data-hook="review-date"]/text()')
    review_dic['purchase'] = review.xpath('.//span[@data-hook="avp-badge"]/text()')
    review_dic['review_text'] = review.xpath('.//span[@data-hook="review-body"]/text()')
    review_dic['helpful_votes'] = review.xpath('.//span[@data-hook="helpful-vote-statement"]/text()')
    reviews_arr.append(review_dic)
print(json.dumps(reviews_arr, indent = 4))

The output scheme is:

{
        "title": [
            "I find it very useful, I use for anything I need"
        ],
        "rating": [
            "5.0 out of 5 stars"
        ],
        "author": [
            "Nicoletta Delon"
        ],
        "date": [
            "on January 2, 2018"
        ],
        "purchase": [
            "Verified Purchase"
        ],
        "review_text": [
            "I like this a lot. I use it a lot. It's a medium to small size but it holds a lot."
        ],
        "helpful_votes": [
            "
      One person found this helpful.
    "
        ]
    }

Now you have to clean the results, remove them from the lists, prevent that the element can be empty and I think you'll have what you need. To get all the reviews, you have to iterate the pages, adding ?pageNumber=1 to the link, and iterating the number. You can use proxies for prevent the blocking of the ip, in case you're going to make many requests.

Get All Reviews From Amazon? Python 3

Tags:

python

python-3.x

xpath

amazon-product-api

1 Answers

Alex

Recent Activity

Donate For Us

Get All Reviews From Amazon? Python 3

Tags:

python

python-3.x

xpath

amazon-product-api

1 Answers

Alex

Related questions

Recent Activity

Donate For Us