Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy integration with DjangoItem yields error

Tags:

django

scrapy

I am trying to run scrapy with DjangoItem. When i run crawl my spider, I get the 'ExampleDotComItem does not support field: title' error. I have created multiple projects and tried to get it to work but always get the same error. I found this tutorial and downloaded the source code, and after running it; I get the same error:

Traceback (most recent call last):
File "c:\programdata\anaconda3\lib\site-packages\twisted\internet\defer.py",line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "C:\Users\A\Desktop\django1.7-scrapy1.0.3-master\example_bot\example_bot\spiders\example.py", line 12, in parse return ExampleDotComItem(title=title, description=description) File "c:\programdata\anaconda3\lib\site-packages\scrapy_djangoitem__init__.py", line 29, in init super(DjangoItem, self).init(*args, **kwargs)
File "c:\programdata\anaconda3\lib\site-packages\scrapy\item.py", line 56, in init self[k] = v
File "c:\programdata\anaconda3\lib\site-packages\scrapy\item.py", line 66, in setitem (self.class.name, key)) KeyError: 'ExampleDotComItem does not support field: title'

Project structure:

├───django1.7-scrapy1.0.3-master
   ├───example_bot
   │   └───example_bot
   │       ├───spiders
   │       │   └───__pycache__
   │       └───__pycache__
   └───example_project
       ├───app
       │   ├───migrations
       │   │   └───__pycache__
       │   └───__pycache__
       └───example_project
           └───__pycache__

My Django Model:

from django.db import models

class ExampleDotCom(models.Model):
    title = models.CharField(max_length=255)
    description = models.CharField(max_length=255)

    def __str__(self):
        return self.title

My "example" Spider:

from scrapy.spiders import BaseSpider
from example_bot.items import ExampleDotComItem

class ExampleSpider(BaseSpider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ['http://www.example.com/']

    def parse(self, response):
         title = response.xpath('//title/text()').extract()[0]
         description = response.xpath('//body/div/p/text()').extract()[0]
         return ExampleDotComItem(title=title, description=description)

Items.py:

from scrapy_djangoitem import DjangoItem
from app.models import ExampleDotCom

class ExampleDotComItem(DjangoItem):
    django_model = ExampleDotCom

pipelines.py:

class ExPipeline(object):
    def process_item(self, item, spider):
        print(item)
        item.save()
        return item

settings.py:

import os
import sys

DJANGO_PROJECT_PATH = '/Users/A/DESKTOP/django1.7-scrapy1.0.3-master/example_project'
DJANGO_SETTINGS_MODULE = 'example_project.settings' #Assuming your django application's name is example_project

sys.path.insert(0, DJANGO_PROJECT_PATH)
os.environ['DJANGO_SETTINGS_MODULE'] = DJANGO_SETTINGS_MODULE
BOT_NAME = 'example_bot'



import django
django.setup()
SPIDER_MODULES = ['example_bot.spiders']

ITEM_PIPELINES = {
    'example_bot.pipelines.ExPipeline': 1000,
}
like image 742
Influenza10 Avatar asked Jan 27 '26 21:01

Influenza10


1 Answers

Can you show your Django model? This is likely occurring because title isn't defined on your ExampleDotCom model.

If it is there, perhaps you need to run your Django migrations?

like image 170
Ryan Buckley Avatar answered Jan 30 '26 09:01

Ryan Buckley



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!