I was doing a scrapy with python3.5 then this happened:
Traceback (most recent call last):
File "F:/PyCharm/xiaozhou/main.py", line 6, in <module>
cmdline.execute("scrapy crawl nvospider".split())
File "F:\Python3.5\lib\site-packages\scrapy\cmdline.py", line 108, in execute
settings = get_project_settings()
File "F:\Python3.5\lib\site-packages\scrapy\utils\project.py", line 60, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "F:\Python3.5\lib\site-packages\scrapy\settings\__init__.py", line 285, in setmodule
self.set(key, getattr(module, key), priority)
File "F:\Python3.5\lib\site-packages\scrapy\settings\__init__.py", line 260, in set
self.attributes[name].set(value, priority)
File "F:\Python3.5\lib\site-packages\scrapy\settings\__init__.py", line 55, in set
value = BaseSettings(value, priority=priority)
File "F:\Python3.5\lib\site-packages\scrapy\settings\__init__.py", line 91, in __init__
self.update(values, priority)
File "F:\Python3.5\lib\site-packages\scrapy\settings\__init__.py", line 317, in update
for name, value in six.iteritems(values):
File "F:\Python3.5\lib\site-packages\six.py", line 581, in iteritems
return iter(d.items(**kw))
AttributeError: 'list' object has no attribute 'items'
Following is my code: This is spider:
from scrapy.spiders import CrawlSpider
from scrapy.selector import Selector
from xiaozhou.items import NovelspiderItem
class novSpider(CrawlSpider):
name = "nvospider"
redis_key = 'nvospider:start_urls'
start_urls = ['http://www.daomubiji.com/']
def parse(self,response):
selector = Selector(response)
table = selector.xpath('//table')
for each in table:
bookname = each.xpath('tr/td[@colspam="3"]/center/h2/text()').extract()[0]
content = each.xpath('tr/td/a/text()').extract()
url = each.xpath('tr/td/a/@herf').extract()
for i in range(len(url)):
item = NovelspiderItem()
item['bookname'] = bookname
item['chapterURL'] = url[i]
try:
item['bookTitle'] = content[i].split(' ')[0]
item['chapterNum'] = content[i].split(' ')[1]
except Exception.e:
continue
try:
item['chapterName'] = content[i].split(' ')[2]
except Exception.e:
item['chapterName'] = content[i].split(' ')[1][-3:]
yield item
Pipelines:
class XiaozhouPipeline(object):
def __init__(self):
connection = pymongo.MongoClient(
settings['MONGODB_HOST'],
settings['MONGODB_PORT']
)
db = connection[settings['MONGO_DBNAME']]
self.collection = db[settings['MONGODB_COLLECTION']]
def process_item(self,item,spider):
self.collection.insert(dict(item))
return item
items: from scrapy import Field, Item
class NovelspiderItem(Item):
bookName = Field()
bookTitle = Field()
chapterNum = Field()
chapterName = Field()
chapterURL = Field()
settings:
# -*- coding: utf-8 -*-
BOT_NAME = 'xiaozhou'
SPIDER_MODULES = ['xiaozhou.spiders']
NEWSPIDER_MODULE = 'xiaozhou.spiders'
ITEM_PIPELINES = ['xiaozhou.pipelines.XiaozhouPipeline']
USER_AGENT = ''
COOKIES_ENABLED = True
MONGODB_SERVER = "localhost"
MONGODB_PORT = 27017
MONGODB_DB = "dbxiaozhou"
MONGODB_COLLECTION = "xiaozhou"
According to the docs ITEM_PIPELINES
setting should be dict, and you got list instead ITEM_PIPELINES = ['xiaozhou.pipelines.XiaozhouPipeline']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With