Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving Django model from Scrapy project

I have a Scrapy project and I am trying to save the output items as an object from a Django model definition (I am not using DjangoItem).

I am importing Django settings as specified here.

def setup_django_env(path):
    import imp, os
    from django.core.management import setup_environ

    f, filename, desc = imp.find_module('settings', [path])
    project = imp.load_module('settings', f, filename, desc)       

    setup_environ(project)

setup_django_env(PATH_TO_DJANGO_PROJECT)

In my Scrapy project I have a pipeline class that processes all the items at the end and saves it to the DB:

from my_django_project.apps.my_books.models import Book, Category, Image

class DjangoPipeline(object):

    def process_item(self, item, spider):
        category = Category.objects.get(name='Horror')
        book = Book(name='something', category=category)
        book.save()
        image = Image(name='something', book=book)
        image.save()
        return item

However, something weird happens and for the first item I get an error (see below). For the rest of the items everything is fine. Let's say I have 7 items to save, so I get an error in the first one and the other 6 get saved.

Traceback (most recent call last):
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/scrapy/middleware.py", line 54, in _process_chain
    return process_chain(self.methods[methodname], obj, *args)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/scrapy/utils/defer.py", line 65, in process_chain
    d.callback(input)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 243, in callback
    self._startRunCallbacks(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 312, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/users/ale/djcode/books/lib/scraper/scraper/djangopipeline.py", line 34, in process_item
    selected_category = Category.objects.get(name='Horror')
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/manager.py", line 132, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 333, in get
    clone = self.filter(*args, **kwargs)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 550, in filter
    return self._filter_or_exclude(False, *args, **kwargs)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 568, in _filter_or_exclude
    clone.query.add_q(Q(*args, **kwargs))
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1131, in add_q
    can_reuse=used_aliases)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1026, in add_filter
    negate=negate, process_extras=process_extras)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1182, in setup_joins
    field, model, direct, m2m = opts.get_field_by_name(name)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 291, in get_field_by_name
    cache = self.init_name_map()
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 321, in init_name_map
    for f, model in self.get_all_related_m2m_objects_with_model():
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 396, in get_all_related_m2m_objects_with_model
    cache = self._fill_related_many_to_many_cache()
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 410, in _fill_related_many_to_many_cache
    for klass in get_models():
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 167, in get_models
    self._populate()
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 61, in _populate
    self.load_app(app_name, True)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 76, in load_app
    app_module = import_module(app_name)
  File "/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/utils/importlib.py", line 35, in import_module
    __import__(name)
exceptions.ImportError: No module named my_books

If I do something like this, all 7 items get saved:

from my_django_project.apps.my_app.models import Book, Category, Image

class DjangoPipeline(object):

    def process_item(self, item, spider):
        try:
            category = Category.objects.get(name='something')
        except:
            category = Category.objects.get(name='something')
        book = Book(name='something', category=category)
        try:
            book.save()
        except:
            book.save()
        image = Image(name='something', book=book)
        try:
            image.save()
        except:
            image.save()
        return item

I don't know what I am doing wrong. Could someone help me, please?

Thanks!

like image 277
Alex Avatar asked Oct 24 '11 23:10

Alex


1 Answers

I had the same problem and I found a solution. At least, it worked for me.

In my case the problem was in Django project's setting.py file - I added not the FQN (fully qualified name) of the my app to the INSTALLED_APPS tuple, but it's short name.

Talking about your example, it may be that you added to the INSTALLED_APPS the my_books element, but not the my_django_project.apps.my_books.

like image 168
Ashald Avatar answered Oct 06 '22 02:10

Ashald