I have been working on a Scrapy project and so far everything works quite well. However, I'm not satisfied with Scrapy's logging configuration possibilities. At the moment, I have set LOG_FILE = 'my_spider.log'
in the settings.py
of my project. When I execute scrapy crawl my_spider
on the command line, it creates one big log file for the entire crawling process. This is not feasible for my purposes.
How can I use Python's custom log handlers in combination with the scrapy.log
module? Especially, I want to make use of Python's logging.handlers.RotatingFileHandler
so that I can split the log data into several small files instead of having to deal with one huge file. The documentation of Scrapy's logging facility is not very extensive, unfortunately. Many thanks in advance!
you can log all scrapy logs to file by first disabling root handle in scrapy.utils.log.configure_logging and then adding your own log handler.
In settings.py file of scrapy project add the following code:
import logging
from logging.handlers import RotatingFileHandler
from scrapy.utils.log import configure_logging
LOG_ENABLED = False
# Disable default Scrapy log settings.
configure_logging(install_root_handler=False)
# Define your logging settings.
log_file = '/tmp/logs/CRAWLER_logs.log'
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
rotating_file_log = RotatingFileHandler(log_file, maxBytes=10485760, backupCount=1)
rotating_file_log.setLevel(logging.DEBUG)
rotating_file_log.setFormatter(formatter)
root_logger.addHandler(rotating_file_log)
Also we customize log level (DEBUG to INFO) and formatter as required. To add custom logs inside you spider, pipeline we can easily do it like a normal python logging as follows:
Inside pipelines.py
import logging
logger = logging.getLogger()
logger.info('processing item')
Hope this helps!
Scrapy uses the standard python loggers, which means you can grab and modify them as you create your spider.
import scrapy
import logging
from logging.handlers import RotatingFileHandler
Class SpiderSpider(scrapy.Spider):
name = 'spider'
start_urls = ['https://en.wikipedia.org/wiki/Spider']
handler = RotatingFileHandler('spider.log', maxBytes=1024, backupCount=3)
logging.getLogger().addHandler(handler)
def parse(self, response):
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With