Scrapy: Extract links and text

Tags:

I am new to scrapy and I am trying to scrape the Ikea website webpage. The basic page with the list of locations as given here.

My items.py file is given below:

Click to copy

import scrapy


class IkeaItem(scrapy.Item):

    name = scrapy.Field()
    link = scrapy.Field()

And the spider is given below:

Click to copy

import  scrapy
from ikea.items import IkeaItem
class IkeaSpider(scrapy.Spider):
    name = 'ikea'

    allowed_domains = ['http://www.ikea.com/']

    start_urls = ['http://www.ikea.com/']

    def parse(self, response):
        for sel in response.xpath('//tr/td/a'):
            item = IkeaItem()
            item['name'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()

            yield item

On running the file I am not getting any output. The json file output is something like:

Click to copy

[[{"link": [], "name": []}

The output that I am looking for is the name of the location and the link. I am getting nothing. Where am I going wrong?

232

asked Jan 03 '15 09:01

praxmon

2 Answers

There is a simple mistake inside the xpath expressions for the item fields. The loop is already going over the a tags, you don't need to specify a in the inner xpath expressions. In other words, currently you are searching for a tags inside the a tags inside the td inside tr. Which obviously results into nothing.

Replace a/text() with text() and a/@href with @href.

(tested - works for me)

115

answered Nov 03 '22 09:11

alecxe

use this....

Click to copy

    item['name'] = sel.xpath('//a/text()').extract()
    item['link'] = sel.xpath('//a/@href').extract()

answered Nov 03 '22 10:11

Ganesh

Related questions
                            
                                Will std::vectors inside another vector reallocate when the first vector reallocates?
                            
                                Jersey/Jackson: how to catch json mapping exception?
                            
                                Insertion-Order Dictionary (like Java's LinkedHashMap) in Swift?
                            
                                Prevent select on input text field
                            
                                How to pass Object using jsp:include param tag into another jsp
                            
                                TypeError: Value can't be converted to a dictionary
                            
                                How does lazy module loading work in ES6
                            
                                Android Realm copyToRealmOrUpdate creates duplicates of nested objects
                            
                                Why is "Build Active Architecture Only" even an option for release builds?
                            
                                How to convert RTF to Markdown on the UNIX/OSX command line similar to pandoc
                            
                                Remove item with $product_id - Woocommerce
                            
                                How to prevent different browsers rendering fonts differently?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy: Extract links and text

Tags:

praxmon

People also ask

2 Answers

alecxe

Ganesh

Recent Activity

Donate For Us