Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I prevent the inheritance of python loggers and handlers during multiprocessing based on fork?

Suppose I configured logging handlers in the main process. The main process spawns some children and due to os.fork() (in Linux) all loggers and handlers are inherited from the main process. In the example below 'Hello World' would be printed 100 times to the console:

import multiprocessing as mp
import logging


def do_log(no):
    # root logger logs Hello World to stderr (StreamHandler)
    # BUT I DON'T WANT THAT!
    logging.getLogger().info('Hello world {}'.format(no))


def main():
    format = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'

    # This creates a StreamHandler
    logging.basicConfig(format=format, level=logging.INFO)

    n_cores = 4
    pool = mp.Pool(n_cores)
    # Log to stdout 100 times concurrently
    pool.map(do_log, range(100))
    pool.close()
    pool.join()


if __name__ == '__main__':
    main()

This will print something like:

ForkPoolWorker-1 root INFO     Hello world 0
ForkPoolWorker-3 root INFO     Hello world 14
ForkPoolWorker-3 root INFO     Hello world 15
ForkPoolWorker-3 root INFO     Hello world 16
...

However, I don't want the child process to inherit all the logging configuration from the parent. So in the example above do_log should not print anything to stderr because there should be no StreamHandler.

How do I prevent inheriting the loggers and handlers without removing or deleting them in the original parent process?


EDIT: Would it be a good idea to simply remove all handlers at the initialization of the pool?

def init_logging():
    for logger in logging.Logger.manager.loggerDict.values():
        if hasattr(logger, 'handlers'):
            logger.handlers = []

and

pool = mp.Pool(n_cores, initializer=init_logging, initargs=())

Moreover, can I also safely close() all (file) handlers during the initialization function?

like image 300
SmCaterpillar Avatar asked Mar 12 '15 17:03

SmCaterpillar


People also ask

Does logging work with multiprocessing?

The multiprocessing module has its own logger with the name “multiprocessing“. This logger is used within objects and functions within the multiprocessing module to log messages, such as debug messages that processes are running or have shutdown. We can get this logger and use it for logging.

What is logger handler Python?

Python Logging Handler The log handler is the component that effectively writes/displays a log: Display it in the console (via StreamHandler), in a file (via FileHandler), or even by sending you an email via SMTPHandler, etc. Each log handler has 2 important fields: A formatter which adds context information to a log.


2 Answers

You don't need to prevent it, you just need to reconfigure the logging hierarchy.

I think you're on the right track with the pool initializer. But instead of trying to hack things, let the logging package do what it's designed to do. Let the logging package do the reconfiguring of the logging hierarchy in the worker processes.

Here's an example:

def main():

    def configure_logging():
        logging_config = {
            'formatters': {
                'f': {
                    'format': '%(processName)-10s %(name)s'
                              ' %(levelname)-8s %(message)s',
                },
            },
            'handlers': {
                'h': {
                    'level':'INFO',
                    'class':'logging.StreamHandler',
                    'formatter':'f',
                },
            },
            'loggers': {
                '': {
                    'handlers': ['h'],
                    'level':'INFO',
                    'propagate': True,
                },
            },
            'version': 1,
        }

        pname = mp.current_process().name
        if pname != 'MainProcess':
            logging_config['handlers'] = {
                'h': {
                    'level':'INFO',
                    'formatter':'f',
                    'class':'logging.FileHandler',
                    'filename': pname + '.log',
                },
            }

        logging.config.dictConfig(logging_config)

    configure_logging() # MainProcess
    def pool_initializer():
        configure_logging()

    n_cores = 4
    pool = mp.Pool(n_cores, initializer=pool_initializer)
    pool.map(do_log, range(100))
    pool.close()
    pool.join()

Now, the worker processes will each log to their own individual log files, and will no longer use the main process's stderr StreamHandler.

like image 155
snapshoe Avatar answered Nov 01 '22 15:11

snapshoe


The most straightforward answer is that you should probably avoid modifying globals with multiprocessing. Note that the root logger, which you get using logging.getLogger(), is global.

The easiest way around this is simply creating a new logging.Logger instance for each process. You can name them after the processes, or simply randomly:

log= logging.getLogger(str(uuid.uuid4()))

You may also want to check how should I log while using multiprocessing in python

like image 24
loopbackbee Avatar answered Nov 01 '22 14:11

loopbackbee