I've found the answer, down below. In short, wrong Indentation in the ItemPipeline caused it to return None.
I've been trying to write a CrawlSpider in Scrapy, having never worked with python before. The Spider crawls,calls the callback function, extracts data and fills the item, but it always returns None. I've tested it with a print article call, everything was in order. I have tried this both with yield and return ( though I still don't understand the difference). Frankly, I'm out of ideas. Down below is the callback function.//edit added the spider code as well
class ZeitSpider(CrawlSpider):
name= xxxx
allowed_domains = ['example.com']
start_urls = ['http://www.example.com/%d/%d' %(JAHR,39)]
rules = (Rule(SgmlLinkExtractor(restrict_xpaths=('//ul[@class="teaserlist"]/li[@class="archiveteaser"]/h4[@class="title"]')),callback='parse_url',follow=True),)
def parse_url(self,response):
hxs = HtmlXPathSelector(response)
article = Article()
article['url']= response.url.encode('UTF-8',errors='strict')
article['author']= hxs.select('//div[@id="informatives"]/ul[@class="tools"]/li[@class="author first"]/text()').extract().pop().encode('UTF-8',errors='strict')
article['title']= hxs.select('//div[@class="articleheader"]/h1/span[@class="title"]/text()').extract().pop().encode('UTF-8',errors='strict')
article['text']= hxs.select('//div[@id="main"]/p/text()').extract().pop().encode('UTF-8',errors='strict')
article['excerpt'] = hxs.select('//p[@class="excerpt"]/text()').extract().pop().encode('UTF-8',errors='strict')
yield article
and the item definition
class Article(Item):
url=Field()
author=Field()
text=Field()
title=Field()
excerpt=Field()
Ok, after stepping through the program with pdb I found the error:
Because I have multiple spiders, I wanted to write multiple ItemPipelines. To make them differentiate per Spider, I added an
if spider.name=='SpiderName'
return item
Notice the Indentation. The Pipeline returned Nothing, and so the output became None.
After changing the Indentation, the spider worked flawlessly. Another example of PEBCAC .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With