Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access scrapy settings from item Pipeline

How do I access the scrapy settings in settings.py from the item pipeline. The documentation mentions it can be accessed through the crawler in extensions, but I don't see how to access the crawler in the pipelines.

like image 972
avaleske Avatar asked Dec 28 '12 21:12

avaleske


People also ask

How do I access Scrapy settings?

Just add the following lines to your_spider.py : # To get your settings from (settings.py): from scrapy. utils. project import get_project_settings ...

What is Item pipeline in Scrapy?

Each item pipeline component (sometimes referred as just “Item Pipeline”) is a Python class that implements a simple method. They receive an item and perform an action over it, also deciding if the item should continue through the pipeline or be dropped and no longer processed.

How does a Scrapy pipeline work?

Description. Item Pipeline is a method where the scrapped items are processed. When an item is sent to the Item Pipeline, it is scraped by a spider and processed using several components, which are executed sequentially. Keep processing the item.


1 Answers

UPDATE (2021-05-04)
Please note that this answer is now ~7 years old, so it's validity can no longer be ensured. In addition it is using Python2

The way to access your Scrapy settings (as defined in settings.py) from within your_spider.py is simple. All other answers are way too complicated. The reason for this is the very poor maintenance of the Scrapy documentation, combined with many recent updates & changes. Neither in the "Settings" documentation "How to access settings", nor in the "Settings API" have they bothered giving any workable example. Here's an example, how to get your current USER_AGENT string.

Just add the following lines to your_spider.py:

# To get your settings from (settings.py): from scrapy.utils.project import get_project_settings ... class YourSpider(BaseSpider):     ...     def parse(self, response):         ...         settings = get_project_settings()         print "Your USER_AGENT is:\n%s" % (settings.get('USER_AGENT'))         ... 

As you can see, there's no need to use @classmethod or re-define the from_crawler() or __init__() functions. Hope this helps.

PS. I'm still not sure why using from scrapy.settings import Settings doesn't work the same way, since it would be the more obvious choice of import?

like image 79
not2qubit Avatar answered Sep 20 '22 07:09

not2qubit