ENVIRONMENT: Windows7, Python 3.6.5, Scrapy 1.5.1 PROBLEM DESCRIPTION: I have a scrapy project called <code>project_github</code>, which contains 3 spiders:<code>spider1</code>, <code>spider2</code>, <code>spider3</code>. Each of these spiders scrapes data from a particular website individual to that spider. I am trying to automatically export a JSON file when a particular spider is executed, with the format: <code>NameOfSpider_TodaysDate.json</code>, so that from the command line I can: Execute the script <code>scrapy crawl spider1</code> which returns <code>spider1_181115.json</code> Currently I am using <code>ITEM EXPORTERS</code> in <code>settings.py</code> with the following code: <pre class="prettyprint"><code>import datetime FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json' FEED_FORMAT = 'json' FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'} FEED_EXPORT_ENCODING = 'utf-8' </code></pre> Obviously this code always writes <code>spider1_TodaysDate.json</code> regardless of the spider used... Any suggestions?

The way to do this is by defining <code>custom_settings</code> as a <code>class</code> attribute under the specific spider were are writing the item exporter for. Spider settings override project settings. So, for <code>spider1</code>: <pre class="prettyprint"><code>class spider1(scrapy.Spider): name = "spider1" allowed_domains = [] custom_settings = { 'FEED_URI': 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json', 'FEED_FORMAT': 'json', 'FEED_EXPORTERS': { 'json': 'scrapy.exporters.JsonItemExporter', }, 'FEED_EXPORT_ENCODING': 'utf-8', } </code></pre>

Scrapy - Use feed exporter for a particular spider (and not others) in a project

Tags:

python

json

scrapy

ENVIRONMENT: Windows7, Python 3.6.5, Scrapy 1.5.1

PROBLEM DESCRIPTION:

I have a scrapy project called project_github, which contains 3 spiders:spider1, spider2, spider3. Each of these spiders scrapes data from a particular website individual to that spider.

I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json, so that from the command line I can:

Execute the script scrapy crawl spider1 which returns spider1_181115.json

Currently I am using ITEM EXPORTERS in settings.py with the following code:

import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'

Obviously this code always writes spider1_TodaysDate.json regardless of the spider used... Any suggestions?

761

asked Nov 15 '18 11:11

johnnydoe

1 Answers

The way to do this is by defining custom_settings as a class attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.

So, for spider1:

class spider1(scrapy.Spider):
    name = "spider1"
    allowed_domains = []

    custom_settings = {
        'FEED_URI': 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
        'FEED_FORMAT': 'json',
        'FEED_EXPORTERS': {
            'json': 'scrapy.exporters.JsonItemExporter',
        },
        'FEED_EXPORT_ENCODING': 'utf-8',
    }

101

answered Sep 19 '22 21:09

johnnydoe

Related questions
                            
                                Pandas DataFrame - How to retrieve specific combinations of MultiIndex levels
                            
                                Python: Lock directory
                            
                                Object of type "datetime.date" has no len ()" in python
                            
                                Virtualenv doesn't use right version of Python
                            
                                How to terminate long-running computation (CPU bound task) in Python using asyncio and concurrent.futures.ProcessPoolExecutor?
                            
                                google colab setting a '^C' in the proccess
                            
                                Pandas and DateTime TypeError: cannot compare a TimedeltaIndex with type float
                            
                                How do i click an element using selenium from a long drop down list?
                            
                                Why does math.isclose() fail to detect minor differences between very large values?
                            
                                Pass command line arguments to test modules
                            
                                pip failling to install for Python 3.7 on MacOs
                            
                                deploying the Tensorflow model in Python
                            
                                Pandas: Group by bi-monthly date field
                            
                                Pandas - Replace other columns in row with 0 if a specific column has a value of 1
                            
                                django.core.exceptions.SuspiciousFileOperation: The joined path is located outside of the base path component
                            
                                My for loop isn't removing items in my array based on condition? Python [duplicate]
                            
                                Python Marshmallow: Dict validation Error
                            
                                PyTorch gradient differs from manually calculated gradient
                            
                                Why cannot python PIL show two images in one program
                            
                                Why do I receive an AttributeError even though import, spelling and file location is correct?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy - Use feed exporter for a particular spider (and not others) in a project

Tags:

python

json

scrapy

johnnydoe

People also ask

1 Answers

johnnydoe

Recent Activity

Donate For Us