Python logging from multiple processes

Tags:

I have a possibly long running program that currently has 4 processes, but could be configured to have more. I have researched logging from multiple processes using python's logging and am using the SocketHandler approach discussed here. I never had any problems having a single logger (no sockets), but from what I read I was told it would fail eventually and unexpectedly. As far as I understand its unknown what will happen when you try to write to the same file at the same time. My code essentially does the following:

import logging
log = logging.getLogger(__name__)

def monitor(...):
    # Spawn child processes with os.fork()
    # os.wait() and act accordingly

def main():
    log_server_pid = os.fork()
    if log_server_pid == 0:
        # Create a LogRecordSocketServer (daemon)
        ...
        sys.exit(0)
    # Add SocketHandler to root logger
    ...
    monitor(<configuration stuff>)

if __name__ == "__main__":
    main()

So my questions are: Do I need to create a new log object after each os.fork()? What happens to the existing global log object?

With doing things the way I am, am I even getting around the problem that I'm trying to avoid (multiple open files/sockets)? Will this fail and why will it fail (I'd like to be able to tell if future similar implementations will fail)?

Also, in what way does the "normal" (one log= expression) method of logging to one file from multiple processes fail? Does it raise an IOError/OSError? Or does it just not completely write data to the file?

If someone could provide an answer or links to help me out, that would be great. Thanks.

FYI: I am testing on Mac OS X Lion and the code will probably end up running on a CentOS 6 VM on a Windows machine (if that matters). Whatever solution I use does not need to work on Windows, but should work on a Unix based system.

UPDATE: This question has started to move away from logging specific behavior and is more in the realm of what does linux do with file descriptors during forks. I pulled out one of my college textbooks and it seems that if you open a file in append mode from two processes (not before a fork) that they will both be able to write to the file properly as long as your write doesn't exceed the actual kernel buffer (although line buffering might need to be used, still not sure on that one). This creates 2 file table entries and one v-node table entry. Opening a file then forking isn't supposed to work, but it seems to as long as you don't exceed the kernel buffer as before (I've done it in a previous program).

So I guess, if you want platform independent multi-processing logging you use sockets and create a new SocketHandler after each fork to be safe as Vinay suggested below (that should work everywhere). For me, since I have strong control over what OS my software is being run on, I think I'm going to go with one global log object with a FileHandler (opens in append mode by default and line buffered on most OSs). The documentation for open says "A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used." or I could just create my own logging stream to be sure of line buffering. And just to be clear, I'm ok with:

# Process A
a_file.write("A\n")
a_file.write("A\n")
# Process B
a_file.write("B\n")

producing...

A\n
B\n
A\n

as long as it doesn't produce...

AB\n
\n
A\n

Vinay(or anyone else), how wrong am I? Let me know. Thanks for any more clarity/sureness you can provide.

893

asked Dec 06 '11 17:12

djhoese

1 Answers

Do I need to create a new log object after each os.fork()? What happens to the existing global log object?

AFAIK the global log object remains pointing to the same logger in parent and child processes. So you shouldn't need to create a new one. However, I think you should create and add the SocketHandler after the fork() in monitor(), so that the socket server has four distinct connections, one to each child process. If you don't do this, then the child processes forked off in monitor() will inherit the SocketHandler and its socket handle from their parent, but I don't know for sure that it will misbehave. The behaviour is likely to be OS-dependent and you may be lucky on OSX.

With doing things the way I am, am I even getting around the problem that I'm trying to avoid (multiple open files/sockets)? Will this fail and why will it fail (I'd like to be able to tell if future similar implementations will fail)?

I wouldn't expect failure if you create the socket connection to the socket server after the last fork() as I suggested above, but I am not sure the behaviour is well-defined in any other case. You refer to multiple open files but I see no reference to opening files in your pseudocode snippet, just opening sockets.

Also, in what way does the "normal" (one log= expression) method of logging to one file from multiple processes fail? Does it raise an IOError/OSError? Or does it just not completely write data to the file?

I think the behaviour is not well-defined, but one would expect failure modes to present as interspersed log messages from different processes in the file, e.g.

Process A writes first part of its message
Process B writes its message
Process A writes second part of its message

Update: If you use a FileHandler in the way you described in your comment, things will not be so good, due to the scenario I've described above: process A and B both begin pointing at the end of file (because of the append mode), but thereafter things can get out of sync because (e.g. on a multiprocessor, but even potentially on a uniprocessor), one process can (preempt another and) write to the shared file handle before another process has finished doing so.

answered Oct 12 '22 01:10

Vinay Sajip

Related questions
                            
                                How to access Object attributes within numpy array structure
                            
                                Managing Python installations
                            
                                Executing bash with subprocess.Popen
                            
                                What is the proper way to update a listfield of embedded documents in mongoengine?
                            
                                How to efficiently store this parsed XML document in MySQL Database using Python?
                            
                                How do I safely destroy a dialog window of a wxPython application?
                            
                                Old code still being executed in ipython after files have been modified
                            
                                Embedding python + numpy code into C++ dll callback
                            
                                Problems getting the Heroku example app for Python run locally with SSL
                            
                                Data structure for large ranges of consecutive integers?
                            
                                How to properly escape single and double quotes
                            
                                How to register a Pylons translator object?
                            
                                Unicode with accents raising UnicodeDecodeError on logging (App Engine)
                            
                                Python 2.7 on Google App Engine, cannot use lxml.etree
                            
                                Unittest in parallel [duplicate]
                            
                                Accessing C++ typedef in Python using SWIG
                            
                                How do you to post a diff to ReviewBoard through its API?
                            
                                A slider for curses based UI
                            
                                Python multiple threads/ multiple processes for reading serial ports
                            
                                Video file reading by OpenCV is very slow in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python logging from multiple processes

Tags:

python

file-io

sockets

multiprocessing

djhoese

People also ask

1 Answers

Vinay Sajip

Recent Activity

Donate For Us