Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does python logging support multiprocessing?

I have been told that logging can not be used in Multiprocessing. You have to do the concurrency control in case multiprocessing messes the log.

But I did some test, it seems like there is no problem using logging in multiprocessing

import time import logging from multiprocessing import Process, current_process, pool   # setup log logger = logging.getLogger(__name__) logging.basicConfig(level=logging.DEBUG,                     format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',                     datefmt='%a, %d %b %Y %H:%M:%S',                     filename='/tmp/test.log',                     filemode='w')   def func(the_time, logger):     proc = current_process()     while True:         if time.time() >= the_time:             logger.info('proc name %s id %s' % (proc.name, proc.pid))             return    if __name__ == '__main__':      the_time = time.time() + 5      for x in xrange(1, 10):         proc = Process(target=func, name=x, args=(the_time, logger))         proc.start() 

As you can see from the code.

I deliberately let the subprocess write log at the same moment( 5s after start) to increase the chance of conflict. But there are no conflict at all.

So my question is can we use logging in multiprocessing? Why so many posts say we can not ?

like image 370
Kramer Li Avatar asked Dec 25 '17 12:12

Kramer Li


People also ask

Is Python logging multiprocessing safe?

As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.

Is multiprocessing possible in Python?

Multiprocessing can dramatically improve processing speedPython's built-in multiprocessing module allows us to designate certain sections of code to bypass the GIL and send the code to multiple processors for simultaneous execution.

What is Python logging used for?

Logging is a means of tracking events that happen when some software runs. Logging is important for software developing, debugging, and running. If you don't have any logging record and your program crashes, there are very few chances that you detect the cause of the problem.

Is logger a singleton Python?

A logger is, perhaps, the most iconic example of a singleton use case.


2 Answers

As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.

Now what happens is that every process holds an open file handle and does an "append write" into that file. The question is under what circumstances the append write is "atomic" (that is, cannot be interrupted by e.g. another process writing to the same file and intermingling his output). This problem applies to every programming language, as in the end they'll do a syscall to the kernel. This answer answers under which circumstances a shared log file is ok.

It comes down to checking your pipe buffer size, on linux that is defined in /usr/include/linux/limits.h and is 4096 bytes. For other OSes you find here a good list.

That means: If your log line is less than 4'096 bytes (if on Linux), then the append is safe, if the disk is directly attached (i.e. no network in between). But for more details please check the first link in my answer. To test this you can do logger.info('proc name %s id %s %s' % (proc.name, proc.pid, str(proc.name)*5000)) with different lenghts. With 5000 for instance I got already mixed up log lines in /tmp/test.log.

In this question there are already quite a few solutions to this, so I won't add my own solution here.

Update: Flask and multiprocessing

Web frameworks like flask will be run in multiple workers if hosted by uwsgi or nginx. In that case, multiple processes may write into one log file. Will it have problems?

The error handling in flask is done via stdout/stderr which is then cought by the webserver (uwsgi, nginx, etc.) which needs to take care that logs are written in correct fashion (see e.g. [this flask+nginx example])(http://flaviusim.com/blog/Deploying-Flask-with-nginx-uWSGI-and-Supervisor/), probably also adding process information so you can associate error lines to processes. From flasks doc:

By default as of Flask 0.11, errors are logged to your webserver’s log automatically. Warnings however are not.

So you'd still have this issue of intermingled log files if you use warn and the message exceeds the pipe buffer size.

like image 63
hansaplast Avatar answered Oct 08 '22 19:10

hansaplast


It is not safe to write to a single file from multiple processes.

According to https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes

Although logging is thread-safe, and logging to a single file from multiple threads in a single process is supported, logging to a single file from multiple processes is not supported, because there is no standard way to serialize access to a single file across multiple processes in Python.

One possible solution would be to have each process write to its own file. You can achieve this by writing your own handler that adds process pid to the end of the file:

import logging.handlers import os   class PIDFileHandler(logging.handlers.WatchedFileHandler):      def __init__(self, filename, mode='a', encoding=None, delay=0):         filename = self._append_pid_to_filename(filename)         super(PIDFileHandler, self).__init__(filename, mode, encoding, delay)      def _append_pid_to_filename(self, filename):         pid = os.getpid()         path, extension = os.path.splitext(filename)         return '{0}-{1}{2}'.format(path, pid, extension) 

Then you just need to call addHandler:

logger = logging.getLogger('foo') fh = PIDFileHandler('bar.log') logger.addHandler(fh) 
like image 37
matino Avatar answered Oct 08 '22 19:10

matino