Just trying out scrapy and trying to get a basic spider working. I know this is just probably something I'm missing but I've tried everything I can think of.
The error I get is:
line 11, in JustASpider
sites = hxs.select('//title/text()')
NameError: name 'hxs' is not defined
My code is very basic at the moment, but I still can't seem to find where I'm going wrong. Thanks for any help!
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class JustASpider(BaseSpider):
name = "google.com"
start_urls = ["http://www.google.com/search?hl=en&q=search"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//title/text()')
for site in sites:
print site.extract()
SPIDER = JustASpider()
When you are using text nodes in a XPath string function, then use . (dot) instead of using .//text(), because this produces the collection of text elements called as node-set.
We are using response. css() to select all the elements with the class title and the tag a. Then we are using the ::attr(href) to select the href attribute of all the elements we have selected. Then we are using the getall() to get all the values of the href attribute.
The code looks quite old version. I recommend using these codes instead
from scrapy.spider import Spider
from scrapy.selector import Selector
class JustASpider(Spider):
name = "googlespider"
allowed_domains=["google.com"]
start_urls = ["http://www.google.com/search?hl=en&q=search"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//title/text()').extract()
print sites
#for site in sites: (I dont know why you want to loop for extracting the text in the title element)
#print site.extract()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With