Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy framework - Colorize logging

I am trying to make Scrapy output colorized logs. I am not so familiar with Python logging, but my understanding is that I must make my own Formatter and make it use by Scrapy. I succeeded in making a Formatter to colorized the output using Clint.

My problem is that I can't make it work within Scrapy correctly. I would have expected the logger object in my spider to have a handler, then I would have switched the formatter of that handler. When I looks what is inside spider.logger.logger, I see that handler is an empty list. I tried to add my formatter in a new stream handler doing.

crawler.spider.logger.logger.addHandler(sh) where sh is a handler using my color formatter.

This have for effect to make scrappy output each messages twice. First message is colorized but doesn't have Scrapy formatting. The second one has Scrapy formatting with no colors.

How can I make Scrapy output colorized logs keeping the same format that can be set in settings.py

Thanks

like image 414
Pier-Yves Lessard Avatar asked Dec 19 '22 09:12

Pier-Yves Lessard


1 Answers

If you mean to colorize LogRecord only, you can customize LOG_FORMAT in settings.py with ANSI escape codes.

Example:

LOG_FORMAT = '\x1b[0;0;34m%(asctime)s\x1b[0;0m \x1b[0;0;36m[%(name)s]\x1b[0;0m \x1b[0;0;31m%(levelname)s\x1b[0;0m: %(message)s'

If you also want to colorize different log levels with different colors, you can override scrapy.utils.log._get_handler(source code).

Put this near the top of your settings.py

import scrapy.utils.log

_get_handler = copy.copy(scrapy.utils.log._get_handler)


def _get_handler_custom(*args, **kwargs):
    handler = _get_handler(*args, **kwargs)
    handler.setFormatter(your_custom_formatter)
    return handler

scrapy.utils.log._get_handler = _get_handler_custom

What it does is reset the formatter after calling the original _get_handler, and then reattach it to scrapy.utils.log. This is a hacky solution and might not be the best practice, but it just works.

A more proper way to achieve this is to override logging.StreamHandler. There is a bunch of discussion on SO which can lead you to the right direction.

Here I provide my full working codes used in my projects (a third-party package colorlog is in use).

settings.py

import copy

from colorlog import ColoredFormatter
import scrapy.utils.log

color_formatter = ColoredFormatter(
    (
        '%(log_color)s%(levelname)-5s%(reset)s '
        '%(yellow)s[%(asctime)s]%(reset)s'
        '%(white)s %(name)s %(funcName)s %(bold_purple)s:%(lineno)d%(reset)s '
        '%(log_color)s%(message)s%(reset)s'
    ),
    datefmt='%y-%m-%d %H:%M:%S',
    log_colors={
        'DEBUG': 'blue',
        'INFO': 'bold_cyan',
        'WARNING': 'red',
        'ERROR': 'bg_bold_red',
        'CRITICAL': 'red,bg_white',
    }
)

_get_handler = copy.copy(scrapy.utils.log._get_handler)

def _get_handler_custom(*args, **kwargs):
    handler = _get_handler(*args, **kwargs)
    handler.setFormatter(color_formatter)
    return handler

scrapy.utils.log._get_handler = _get_handler_custom
like image 164
amigcamel Avatar answered Jan 03 '23 18:01

amigcamel