Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy log issue

i have multiple spiders in one project , problem is right now i am defining LOG_FILE in SETTINGS like

LOG_FILE = "scrapy_%s.log" % datetime.now()

what i want is scrapy_SPIDERNAME_DATETIME

but i am unable to provide spidername in log_file name ..

i found

scrapy.log.start(logfile=None, loglevel=None, logstdout=None)

and called it in each spider init method but its not working ..

any help would be appreciated

like image 316
akhter wahab Avatar asked May 26 '26 05:05

akhter wahab


1 Answers

The spider's __init__() is not early enough to call log.start() by itself since the log observer is already started at this point; therefore, you need to reinitialize the logging state to trick Scrapy into (re)starting it.

In your spider class file:

from datetime import datetime
from scrapy import log
from scrapy.spider import BaseSpider

class ExampleSpider(BaseSpider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["http://www.example.com/"]

    def __init__(self, name=None, **kwargs):
        LOG_FILE = "scrapy_%s_%s.log" % (self.name, datetime.now())
        # remove the current log
        # log.log.removeObserver(log.log.theLogPublisher.observers[0])
        # re-create the default Twisted observer which Scrapy checks
        log.log.defaultObserver = log.log.DefaultObserver()
        # start the default observer so it can be stopped
        log.log.defaultObserver.start()
        # trick Scrapy into thinking logging has not started
        log.started = False
        # start the new log file observer
        log.start(LOG_FILE)
        # continue with the normal spider init
        super(ExampleSpider, self).__init__(name, **kwargs)

    def parse(self, response):
        ...

And the output file might look like:

scrapy_example_2012-08-25 12:34:48.823896.log

like image 89
Steven Almeroth Avatar answered May 27 '26 18:05

Steven Almeroth