Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy: Default values for items & fields. What is the best implementation?

Tags:

scrapy

As far as I could find out from the documentation and various discussions on the net, the ability to add default values to fields in a scrapy item has been removed.

This doesn't work

category = Field(default='null')

So my question is: what is a good way to initialize fields with a default value?

I already tried to implement it as a item pipeline as suggested here, without any success. https://groups.google.com/forum/?fromgroups=#!topic/scrapy-users/-v1p5W41VDQ

like image 934
Jabb Avatar asked Mar 29 '13 00:03

Jabb


People also ask

What are items in Scrapy?

Scrapy Items are wrappers around, the dictionary data structures. Code can be written, such that, the extracted data is returned, as Item objects, in the format of “key-value” pairs.

What is Item loader in Scrapy?

Item Loaders provide a convenient mechanism for populating scraped items. Even though items can be populated directly, Item Loaders provide a much more convenient API for populating them from a scraping process, by automating some common tasks like parsing the raw extracted data before assigning it.

What is field in Scrapy?

Field([arg]) class scrapy. Field([arg]) The Field class is just an alias to the built-in dict class and doesn't provide any extra functionality or attributes. In other words, Field objects are plain-old Python dicts. A separate class is used to support the item declaration syntax based on class attributes.

What is pipeline in Scrapy?

Each item pipeline component (sometimes referred as just “Item Pipeline”) is a Python class that implements a simple method. They receive an item and perform an action over it, also deciding if the item should continue through the pipeline or be dropped and no longer processed.


1 Answers

figured out what the problem was. the pipeline is working (code follows for other people's reference). my problem was, that I am appending values to a field. and I wanted the default method work on one of these listvalues... chose a different way and it works. I am now implementing it with a custom setDefault processor method.

class DefaultItemPipeline(object):

def process_item(self, item, spider):
    item.setdefault('amz_VendorsShippingDurationFrom', 'default')
    item.setdefault('amz_VendorsShippingDurationTo', 'default')
    # ...
    return item
like image 57
Jabb Avatar answered Oct 15 '22 01:10

Jabb