Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

suppress Scrapy Item printed in logs after pipeline

Tags:

python

scrapy

I have a scrapy project where the item that ultimately enters my pipeline is relatively large and stores lots of metadata and content. Everything is working properly in my spider and pipelines. The logs, however, are printing out the entire scrapy Item as it leaves the pipeline (I believe):

2013-01-17 18:42:17-0600 [tutorial] DEBUG: processing Pipeline pipeline module 2013-01-17 18:42:17-0600 [tutorial] DEBUG: Scraped from <200 http://www.example.com>     {'attr1': 'value1',      'attr2': 'value2',      'attr3': 'value3',      ...      snip      ...      'attrN': 'valueN'} 2013-01-17 18:42:18-0600 [tutorial] INFO: Closing spider (finished) 

I would rather not have all this data puked into log files if I can avoid it. Any suggestions about how to suppress this output?

like image 914
dino Avatar asked Jan 18 '13 01:01

dino


1 Answers

Another approach is to override the __repr__ method of the Item subclasses to selectively choose which attributes (if any) to print at the end of the pipeline:

from scrapy.item import Item, Field class MyItem(Item):     attr1 = Field()     attr2 = Field()     # ...     attrN = Field()      def __repr__(self):         """only print out attr1 after exiting the Pipeline"""         return repr({"attr1": self.attr1}) 

This way, you can keep the log level at DEBUG and show only the attributes that you want to see coming out of the pipeline (to check attr1, for example).

like image 130
dino Avatar answered Sep 23 '22 14:09

dino