Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing+logging.FileHandler

I'm trying to implement logging in multiprocessing server. According to the documentation, "logging to a single file from multiple processes is not supported". I've created a small program to check this statement:

import logging
import multiprocessing
import os

log = logging.getLogger()


def setup_logger():
    formatter = logging.Formatter('%(asctime)s %(name)s %(levelname)s: %(message)s')

    fileHandler = logging.FileHandler('test.log')
    fileHandler.setFormatter(formatter)

    log.setLevel(logging.DEBUG)
    log.addHandler(fileHandler)


def write_log_entries(identifier, start_event):
    start_event.wait()
    for i in range(100):
        s = ''.join(str(identifier) for k in range(30))
        log.info('[{}, {}] --- {}'.format(os.getpid(), identifier, s))


if __name__ == '__main__':
    setup_logger()
    procs = []
    start_event = multiprocessing.Event()
    for i in range(100, 300):
        p = multiprocessing.Process(target=write_log_entries, args=(i, start_event))
        procs.append(p)

    for p in procs:
        p.start()

    start_event.set()

    for p in procs:
        p.join()

After executing the above code I expected to see a complete mess in "test.log", but everything seems to be fine (except timestamps, of course, which are not in sequence).

Can anybody explain why don't log entries overlap when the log file is being written by multiple processes simultaneously? Can log.info() be considered atomic in this case?

like image 978
user1642265 Avatar asked Sep 02 '12 18:09

user1642265


1 Answers

Short answer: the kernel locks single calls to write, so you're OK as long as the messages are small, so they get flushed in a single write, and this write succeeds in writing everything at once. There is no general guarantee that this is the case, which is why the docs don't promise that this will work at all.

Long answer: Each call to log.info flushes the log output. This is necessary, or you wouldn't be able to see the latest log entries in the file. On the Python/libc level, flush is implemented as a call to the write(2) syscall, which is invoked to write out the contents of the file's buffer, if any. In your case, the buffer contents is your log message. So, Python or libc, depending on which file is being used, ends up calling the OS call such as:

write(fd, buf, buflen);

...where fd is the system-level file descriptor of the log file, buf is the memory where the write was buffered, and buflen is the length of the message. (You can see these calls if you trace your Python process with a tool such as strace on Linux.) write returns the number of characters successfully written, and the kernel will not interleave those characters with other writes in the same region of the file. If the file is opened in O_APPEND mode, the writes are even guaranteed to be at the end of file, at least on Unix. So, if buflen is small, as is the case with normal log messages, everything is fine. But at least two things can go wrong.

First, there is no guarantee that all of buflen will be written out in a single write. write can be interrupted by a signal, fd can point to a device that accepts writes of fixed size, or your log message may be large enough for the kernel not to accept it in a single chunk. Normally, this wouldn't be a problem—correctly written write is always implemented as a loop. But in your case, this would be a disaster, because the different calls to write would interleave with other processes.

Second, if your log message is large enough that it doesn't fit in the stdio buffer (8K or so), then it will be split into chunks before it ever reaches the kernel. This can easily happen when logging tracebacks, or when formatting logs into verbose format like XML.

like image 78
user4815162342 Avatar answered Oct 23 '22 12:10

user4815162342