Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

from scrapy.selector import selector error

I am unable to do the following:

from scrapy.selector import Selector

The error is:

File "/Desktop/KSL/KSL/spiders/spider.py", line 1, in from scrapy.selector import Selector ImportError: cannot import name Selector

It is as if LXML is not installed on my machine, but it is. Also, I thought this was a default module built into scrapy. Maybe not?

Thoughts?

like image 482
SMPLGRP Avatar asked Oct 16 '13 22:10

SMPLGRP


People also ask

How do you write XPath for Scrapy?

When you are using text nodes in a XPath string function, then use . (dot) instead of using .//text(), because this produces the collection of text elements called as node-set.

How do you make a href in Scrapy?

We are using response. css() to select all the elements with the class title and the tag a. Then we are using the ::attr(href) to select the href attribute of all the elements we have selected. Then we are using the getall() to get all the values of the href attribute.


2 Answers

Try importing HtmlXPathSelector instead.

    from scrapy.selector import HtmlXPathSelector

And then use the .select() method to parse out your html. For example,

    sel = HtmlXPathSelector(response)
    site_names = sel.select('//ul/li')

If you are following the tutorial on the Scrapy site (http://doc.scrapy.org/en/latest/intro/tutorial.html), the updated example would look like this:

    from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector

    class DmozSpider(BaseSpider):
        name = "dmoz"
        allowed_domains = ["dmoz.org"]
        start_urls = [
            "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
            "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
        ]

        def parse(self, response):
            sel = HtmlXPathSelector(response)
            sites = sel.select('//ul/li')

            for site in sites:
                title = site.select('a/text()').extract()
                link = site.select('a/@href').extract()
                desc = site.select('text()').extract()
                print title, link, desc

Hope this helps!

like image 161
user256604 Avatar answered Sep 20 '22 22:09

user256604


I encounter the same problem. I think there is something wrong with your scrapy version.

You could type scrapy version -v into cmd to check the version. As far as I know, the newest version is 0.24.4 (2014.10.23). You could visit http://scrapy.org/ to find the newest.

like image 42
yongkai Avatar answered Sep 17 '22 22:09

yongkai