This question was asked here Foreign Keys on Scrapy without an accepted answer, so I am here to re-raise the question with a clearer defined minimum set up:
The django model:
class Article(models.Model):
title = models.CharField(max_length=255)
content = models.TextField()
category = models.ForeignKey('categories.Category', null=True, blank=True)
Note how category
is defined is irrelevant here, but it does use ForeignKey
. So, in django shell, this would work:
c = Article(title="foo", content="bar", category_id=2)
c.save()
The scrapy item:
class BotsItem(DjangoItem):
django_model = Article
The scrapy pipeline:
class BotsPipeline(object):
def process_item(self, item, spider):
item['category_id'] = 2
item.save()
return item
With the above code, scrapy complains:
exceptions.KeyError: 'BotsItem does not support field: category_id'
Fair, since category_id
is not appeared in django model, from which we get the scrapy item. For the record, if we have the pipeline (assume we have a category foo
):
class BotsPipeline(object):
def process_item(self, item, spider):
item['category'] = 'foo'
item.save()
return item
Now scrapy complains:
exceptions.TypeError: isinstance() arg 2 must be a class, type, or tuple
of classes and types
So exactly what should we do?
Okay I managed to solve this problem and I am putting here for the records. As hinted by the last exceptions.TypeError
, item['category']
expects an instance of Category
class, in my case I am using django-categories
so in the pipeline just replace with this (assume Category
is populated in ORM already):
class BotsPipeline(object):
def process_item(self, item, spider):
item['category'] = Category.objects.get(id=2)
item.save()
return item
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With