Scrapy HtmlXPathSelector

Tags:

scrapy

Just trying out scrapy and trying to get a basic spider working. I know this is just probably something I'm missing but I've tried everything I can think of.

The error I get is:

line 11, in JustASpider
    sites = hxs.select('//title/text()')
NameError: name 'hxs' is not defined

My code is very basic at the moment, but I still can't seem to find where I'm going wrong. Thanks for any help!

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

class JustASpider(BaseSpider):
    name = "google.com"
    start_urls = ["http://www.google.com/search?hl=en&q=search"]


    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//title/text()')
        for site in sites:
            print site.extract()


SPIDER = JustASpider()

430

asked Sep 03 '12 22:09

Keanan Koppenhaver

1 Answers

The code looks quite old version. I recommend using these codes instead

from scrapy.spider import Spider
from scrapy.selector import Selector

class JustASpider(Spider):
    name = "googlespider"
    allowed_domains=["google.com"]
    start_urls = ["http://www.google.com/search?hl=en&q=search"]


    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//title/text()').extract()
        print sites
        #for site in sites: (I dont know why you want to loop for extracting the text in the title element)
            #print site.extract()

hope it helps and here is a good example to follow.

101

answered Sep 27 '22 23:09

pink bunny

Related questions
                            
                                how to read json file with pandas?
                            
                                using proxy with scrapy-splash
                            
                                Scrapy, hash tag on URLs
                            
                                python website language detection
                            
                                How to match a case insensitive value with XPath
                            
                                Sending e-mail after scrape in scrapy
                            
                                Terminate Scrapy if a condition is met
                            
                                How can Scrapy deal with Javascript
                            
                                Error installing Twisted on Windows 10, Python 3.8.0
                            
                                Can't get through a form with scrapy
                            
                                Scrapy ITEM_PIPELINES warning
                            
                                How do I pass form data with Scrapy from the command line?
                            
                                Changing Scrapy/Splash user agent
                            
                                (Python 3) Spider must return Request, BaseItem, dict or None, got 'generator'
                            
                                Modifiying CSV export in scrapy
                            
                                How to clear cookies in scrapy?
                            
                                runspider: error: File not found: - Scrapy
                            
                                Python Scrapy 301 redirects
                            
                                Scrapy BaseSpider: How does it work?
                            
                                scrapy HtmlXPathSelector from a string [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With