I am trying to scrape rating off of trustpilot.com.
Is it possible to extract a class name using scrapy? I am trying to scrape a rating which is made up of five individual images but the images are in a class with the name of the rating for example if the rating is 2 starts then:
<div class="star-rating count-2 size-medium clearfix">...
if it is 3 stars then:
<div class="star-rating count-3 size-medium clearfix">...
So is there a way I can scrape the class count-2
or count-3
assuming a selector like .css('.star-rating')
?
You could use a combination of both somewhere in your code:
import re
classes = response.css('.star-rating').xpath("@class").extract()
for cls in classes:
match = re.search(r'\bcount-\d+\b', cls)
if match:
print("Class = {}".format(match.group(0))
You can extract rating directly using re_first()
and re()
:
for rating in response.xpath('//div[contains(@class, "star-rating")]/@class').re(r'count-(\d+)'):
print(rating)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With