Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy/Python/XPath - How to extract data from within data?

I'm new to Scrapy, and I've just started looking into XPath.

I'm trying to extract titles and links from html list items within a div. The following code is how I thought I'd go about doing it, (selecting the ul div, by id, then looping through the list items):

def parse(self, response):
    for t in response.xpath('//*[@id="categories"]/ul'):
        for x in t.xpath('//li'):
            item = TgmItem()
            item['title'] = x.xpath('a/text()').extract()
            item['link'] = x.xpath('a/@href').extract()
            yield item

But I received the same results as this attempt:

def parse(self, response):
    for x in response.xpath('//li'):
        item = TgmItem()
        item['title'] = x.xpath('a/text()').extract()
        item['link'] = x.xpath('a/@href').extract()
        yield item

Where the exported csv file contains li data from source code top to bottom...

I'm not an expert and I've made a number of attempts, if anyone could shed some light on this it would be appreciated.

like image 974
Alex Legg Avatar asked Sep 13 '14 19:09

Alex Legg


1 Answers

You need to start your xpath expression used inside the inner loop with a dot:

for t in response.xpath('//*[@id="categories"]/ul'):
    for x in t.xpath('.//li'):

This would make it search in the scope of current element, not the whole page.

See more explanation at Working with relative XPaths.

like image 163
alecxe Avatar answered Sep 19 '22 02:09

alecxe