I am trying to scrape a website and the content of the html looks something like this
<div class="panel-heading" role="tab" id="heading727654">
<h4 class="panel-title">
<a class="collapsed" data-toggle="collapse" data-parent="#accordion" href="#collapse727654" aria-expanded="false" aria-controls="collapse727654">
<div class="product-name">
<span class="product-title">
Aubrey<br><i>AGE DEFYING THERAPY CLEANSER 3.4 OZ</i>
</span>
</div>
<div class="product-price">
<span>
$10.99 / 3.40 OZ
</span>
</a>
</h4>
</div>
<div class="panel-heading" role="tab" id="heading727655">
<h4 class="panel-title">
<a class="collapsed" data-toggle="collapse" data-parent="#accordion" href="#collapse727655" aria-expanded="false" aria-controls="collapse727654">
<div class="product-name">
<span class="product-title">
Aubrey<br><i>AGE DEFYING THERAPY LIQUID</i>
</span>
</div>
<div class="product-price">
<span>
$12.99 / 4.40 OZ
</span>
</a>
</h4>
</div>
My python code snippet to extract this is something like
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
print node.xpath('//span[re:test(@class, "product-title")]//text()').extract()
print node.xpath('//span[re:test(@class, "product-price")]//text()').extract()
When I run the above scrapy code in Python, I am not getting the expected output, the same content is being repeated 100 times. Can someone help me with this?
You need to prepend dots to your inner XPath expressions to make them work in the context of node. Otherwise the search starts from the root of the tree:
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
print node.xpath('.//span[re:test(@class, "product-title")]//text()').extract()
print node.xpath('.//span[re:test(@class, "product-price")]//text()').extract()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With