Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python logging from multiple processes

I have a possibly long running program that currently has 4 processes, but could be configured to have more. I have researched logging from multiple processes using python's logging and am using the SocketHandler approach discussed here. I never had any problems having a single logger (no sockets), but from what I read I was told it would fail eventually and unexpectedly. As far as I understand its unknown what will happen when you try to write to the same file at the same time. My code essentially does the following:

import logging
log = logging.getLogger(__name__)

def monitor(...):
    # Spawn child processes with os.fork()
    # os.wait() and act accordingly

def main():
    log_server_pid = os.fork()
    if log_server_pid == 0:
        # Create a LogRecordSocketServer (daemon)
        ...
        sys.exit(0)
    # Add SocketHandler to root logger
    ...
    monitor(<configuration stuff>)

if __name__ == "__main__":
    main()

So my questions are: Do I need to create a new log object after each os.fork()? What happens to the existing global log object?

With doing things the way I am, am I even getting around the problem that I'm trying to avoid (multiple open files/sockets)? Will this fail and why will it fail (I'd like to be able to tell if future similar implementations will fail)?

Also, in what way does the "normal" (one log= expression) method of logging to one file from multiple processes fail? Does it raise an IOError/OSError? Or does it just not completely write data to the file?

If someone could provide an answer or links to help me out, that would be great. Thanks.

FYI: I am testing on Mac OS X Lion and the code will probably end up running on a CentOS 6 VM on a Windows machine (if that matters). Whatever solution I use does not need to work on Windows, but should work on a Unix based system.

UPDATE: This question has started to move away from logging specific behavior and is more in the realm of what does linux do with file descriptors during forks. I pulled out one of my college textbooks and it seems that if you open a file in append mode from two processes (not before a fork) that they will both be able to write to the file properly as long as your write doesn't exceed the actual kernel buffer (although line buffering might need to be used, still not sure on that one). This creates 2 file table entries and one v-node table entry. Opening a file then forking isn't supposed to work, but it seems to as long as you don't exceed the kernel buffer as before (I've done it in a previous program).

So I guess, if you want platform independent multi-processing logging you use sockets and create a new SocketHandler after each fork to be safe as Vinay suggested below (that should work everywhere). For me, since I have strong control over what OS my software is being run on, I think I'm going to go with one global log object with a FileHandler (opens in append mode by default and line buffered on most OSs). The documentation for open says "A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used." or I could just create my own logging stream to be sure of line buffering. And just to be clear, I'm ok with:

# Process A
a_file.write("A\n")
a_file.write("A\n")
# Process B
a_file.write("B\n")

producing...

A\n
B\n
A\n

as long as it doesn't produce...

AB\n
\n
A\n

Vinay(or anyone else), how wrong am I? Let me know. Thanks for any more clarity/sureness you can provide.

like image 893
djhoese Avatar asked Dec 06 '11 17:12

djhoese


People also ask

Does logging work with multiprocessing?

The multiprocessing module has its own logger with the name “multiprocessing“. This logger is used within objects and functions within the multiprocessing module to log messages, such as debug messages that processes are running or have shutdown. We can get this logger and use it for logging.

Is Python logging multiprocessing safe?

As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.


1 Answers

Do I need to create a new log object after each os.fork()? What happens to the existing global log object?

AFAIK the global log object remains pointing to the same logger in parent and child processes. So you shouldn't need to create a new one. However, I think you should create and add the SocketHandler after the fork() in monitor(), so that the socket server has four distinct connections, one to each child process. If you don't do this, then the child processes forked off in monitor() will inherit the SocketHandler and its socket handle from their parent, but I don't know for sure that it will misbehave. The behaviour is likely to be OS-dependent and you may be lucky on OSX.

With doing things the way I am, am I even getting around the problem that I'm trying to avoid (multiple open files/sockets)? Will this fail and why will it fail (I'd like to be able to tell if future similar implementations will fail)?

I wouldn't expect failure if you create the socket connection to the socket server after the last fork() as I suggested above, but I am not sure the behaviour is well-defined in any other case. You refer to multiple open files but I see no reference to opening files in your pseudocode snippet, just opening sockets.

Also, in what way does the "normal" (one log= expression) method of logging to one file from multiple processes fail? Does it raise an IOError/OSError? Or does it just not completely write data to the file?

I think the behaviour is not well-defined, but one would expect failure modes to present as interspersed log messages from different processes in the file, e.g.

Process A writes first part of its message
Process B writes its message
Process A writes second part of its message

Update: If you use a FileHandler in the way you described in your comment, things will not be so good, due to the scenario I've described above: process A and B both begin pointing at the end of file (because of the append mode), but thereafter things can get out of sync because (e.g. on a multiprocessor, but even potentially on a uniprocessor), one process can (preempt another and) write to the shared file handle before another process has finished doing so.

like image 88
Vinay Sajip Avatar answered Oct 12 '22 01:10

Vinay Sajip