How to iterate through nodes in scrapy using python

Question

I am trying to scrape a website and the content of the html looks something like this

<div class="panel-heading" role="tab" id="heading727654">
            <h4 class="panel-title">
                <a class="collapsed" data-toggle="collapse" data-parent="#accordion" href="#collapse727654" aria-expanded="false" aria-controls="collapse727654">
                    <div class="product-name">
                        <span class="product-title">
                            Aubrey<br><i>AGE DEFYING THERAPY CLEANSER 3.4 OZ</i>
                        </span>
                    </div>
                    <div class="product-price">
                        <span>
                            $10.99 / 3.40 OZ 
                        </span>
                </a>
            </h4>
</div>
<div class="panel-heading" role="tab" id="heading727655">
            <h4 class="panel-title">
                <a class="collapsed" data-toggle="collapse" data-parent="#accordion" href="#collapse727655" aria-expanded="false" aria-controls="collapse727654">
                    <div class="product-name">
                        <span class="product-title">
                            Aubrey<br><i>AGE DEFYING THERAPY LIQUID</i>
                        </span>
                    </div>
                    <div class="product-price">
                        <span>
                            $12.99 / 4.40 OZ 
                        </span>
                </a>
            </h4>
</div>

My python code snippet to extract this is something like

def parse(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename, 'wb') as f:
            for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
                print node.xpath('//span[re:test(@class, "product-title")]//text()').extract()
                print node.xpath('//span[re:test(@class, "product-price")]//text()').extract()

When I run the above scrapy code in Python, I am not getting the expected output, the same content is being repeated 100 times. Can someone help me with this?

alecxe · Accepted Answer

You need to prepend dots to your inner XPath expressions to make them work in the context of node. Otherwise the search starts from the root of the tree:

def parse(self, response):
    filename = response.url.split("/")[-2] + '.html'
    with open(filename, 'wb') as f:
        for node in response.xpath('//div[re:test(@class, "panel-heading")]'):
            print node.xpath('.//span[re:test(@class, "product-title")]//text()').extract()
            print node.xpath('.//span[re:test(@class, "product-price")]//text()').extract()

How to iterate through nodes in scrapy using python

Tags:

python

web-scraping

scrapy

goutam

1 Answers

alecxe

Recent Activity

Donate For Us

How to iterate through nodes in scrapy using python

Tags:

python

web-scraping

scrapy

goutam

1 Answers

alecxe

Related questions

Recent Activity

Donate For Us