Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens if I log into the same file from multiple different processes in python?

I spent hours to dig the behavior, first about those questions:

  • Atomicity of `write(2)` to a local filesystem
  • How can I synchronize -- make atomic -- writes on one file from from two processes?
  • How does one programmatically determine if "write" system call is atomic on a particular file?
  • What happens if a write system call is called on same file by 2 different processes simultaneously
  • http://article.gmane.org/gmane.linux.kernel/43445

It seems if we use 'O_APPEND' flag when opening file, it will always be ok to logging to same file from multiple processes, on linux. And i believe python surely use 'O_APPEND' flag in its logging module.

And from a small test:

#!/bin/env python
import os
import logging

logger = logging.getLogger('spam_application')
logger.setLevel(logging.DEBUG)
# create file handler which logs even debug messages
fh = logging.FileHandler('spam.log')
logger.addHandler(fh)
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)

for i in xrange(10000):
    p = os.getpid()
    logger.debug('Log line number %s in %s', i, p)

And i run it with:

./test.py & ./test.py & ./test.py & ./test.py &

I found there were nothing wrong in spam.log. This behavior may support the conclusion above.

But problems comming:

  • What does this mean here?
  • And what are the scenes to use this, just for file rotation?

At last, if two process are doing write on one same file, i mean they are invoking write(2) on the same file, who make sure the data from the two processes not interleave(kernel or filesystem?), and how.[NOTE: i just want to see in deep about the write syscall, any hit about this is welcome.]

EDIT1 :

Do this and this just exist there for compatibility between different os environments, like windows, linux, or mac?

EDIT2 :

One more test, feed 8KB strings to logging.debug each time. And this time i can see the "interleaving" behavior in spam.log. This behavior is just what specified about PIPE_BUF in one page above. So it seems the behavior is clear on linux, using O_APPEND is ok if the size to write(2) is less then PIPE_BUF.

like image 897
apporc Avatar asked Jul 06 '16 08:07

apporc


People also ask

Can multiple processes append to the same file?

Two processes successfully appending to the same file will result in all their bytes in the file in order, but not necessarily contiguously. The caveat is that not all filesystems are POSIX-compatible. Two famous examples are NFS and the Hadoop Distributed File System (HDFS).

Is logging process safe python?

Although logging module is thread-safe, it's not process-safe. If you want multiple processes to write to the same log file, then you have to manually take care of the access to your file.

Does logging work with multiprocessing?

The multiprocessing module has its own logger with the name “multiprocessing“. This logger is used within objects and functions within the multiprocessing module to log messages, such as debug messages that processes are running or have shutdown. We can get this logger and use it for logging.

Can multiple processes write to disk at the same time?

no, generally it is not safe to do this! you need to obtain an exclusive write lock for each process -- that implies that all the other processes will have to wait while one process is writing to the file.. the more I/O intensive processes you have, the longer the wait time.


2 Answers

I digged deeper and deeper. Now I think it's clear about these facts:

  1. With O_APPEND, parallel write(2) from multiple processes is ok. It's just order of lines are undetermined, but lines don't interleave or overwrite each other. And the size of data is by any amount, according to Niall Douglas's answer for Understanding concurrent file writes from multiple processes. I have tested about this "by any amount" on linux and have not found the upper limit, so i guess it's right.

  2. Without O_APPEND, it will be a mess. Here is what POSIX says "This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control."

  3. Now we come into python. The test i did in EDIT3, that 8K, i found it's origin. Python's write() use fwrite(3) in fact, and my python set a BUFF_SIZE here which is 8192. According to an answer from abarnert in Default buffer size for a file on Linux. This 8192 has a long story.

However, more information is welcome.

like image 176
apporc Avatar answered Oct 10 '22 01:10

apporc


I would not rely on tests here. Weird things can only happen in race conditions, and exhibiting a race condition by test is almost non sense because the race condition is unlikely to occur. So it can work nicely for 1000 test runs and randomly break later in prod... The page you cite says:

logging to a single file from multiple processes is not supported, because there is no standard way to serialize access to a single file across multiple processes in Python

That does not means that it will break... it could even be safe in a particular implementation on a particular file system. It just means that it can break without any hope of fix on any other version of Python or on any other filesystem.

If you really want to make sure of it, you will have to dive into Python source code (for your version) to control how logging is actually implemented, and control whether is it safe on your file system. And you will always be threatened by the possibility that a later optimization in logging module breaks your assumptions.

IMHO that is the reason for the warning in Logging Cookbook, and the existence of a special module to allow concurrent logging to the same file. This last one does not rely on anything unspecified, but just uses explicit locking.

like image 24
Serge Ballesta Avatar answered Oct 10 '22 01:10

Serge Ballesta