Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy djangoitem with Foreign Key

This question was asked here Foreign Keys on Scrapy without an accepted answer, so I am here to re-raise the question with a clearer defined minimum set up:

The django model:

class Article(models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()
    category = models.ForeignKey('categories.Category', null=True, blank=True)

Note how category is defined is irrelevant here, but it does use ForeignKey. So, in django shell, this would work:

c = Article(title="foo", content="bar", category_id=2)
c.save()

The scrapy item:

class BotsItem(DjangoItem):
    django_model = Article

The scrapy pipeline:

class BotsPipeline(object):
    def process_item(self, item, spider):
        item['category_id'] = 2
        item.save()
        return item

With the above code, scrapy complains:

exceptions.KeyError: 'BotsItem does not support field: category_id'

Fair, since category_id is not appeared in django model, from which we get the scrapy item. For the record, if we have the pipeline (assume we have a category foo):

class BotsPipeline(object):
    def process_item(self, item, spider):
        item['category'] = 'foo'
        item.save()
        return item

Now scrapy complains:

exceptions.TypeError: isinstance() arg 2 must be a class, type, or tuple
 of classes and types

So exactly what should we do?

like image 837
eN_Joy Avatar asked Jun 21 '14 06:06

eN_Joy


1 Answers

Okay I managed to solve this problem and I am putting here for the records. As hinted by the last exceptions.TypeError, item['category'] expects an instance of Category class, in my case I am using django-categories so in the pipeline just replace with this (assume Category is populated in ORM already):

class BotsPipeline(object):
    def process_item(self, item, spider):
        item['category'] = Category.objects.get(id=2)
        item.save()
        return item
like image 107
eN_Joy Avatar answered Sep 20 '22 11:09

eN_Joy