Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing logging - why multiprocessing.get_logger

I've been struggled with multiprocessing logging for some time, and for many reasons.

One of my reason is, why another get_logger.

Of course I've seen this question and it seems the logger that multiprocessing.get_logger returns do some "process-shared locks" magic to make logging handling smooth.

So, today I looked into the multiprocessing code of Python 2.7 (/multiprocessing/util.py), and found that this logger is just a plain logging.Logger, and there's barely any magic around it.

Here's the description in Python documentation, right before the get_logger function:

Some support for logging is available. Note, however, that the logging package does not use process shared locks so it is possible (depending on the handler type) for messages from different processes to get mixed up.

So when you use a wrong logging handler, even the get_logger logger may go wrong? I've used a program uses get_logger for logging for some time. It prints logs to StreamHandler and (seems) never gets mixed up.

Now My theory is:

  1. multiprocessing.get_logger don't do process-shared locks at all
  2. StreamHandler works for multiprocessing, but FileHandler doesn't
  3. major purpose of this get_logger logger is for tracking processes' life-cycle, and provide a easy-to-get and ready-to-use logger that already logs process's name/id kinds of stuff

Here's the question:

Is my theory right?

How/Why/When do you use this get_logger?

like image 554
tdihp Avatar asked Nov 23 '12 01:11

tdihp


People also ask

Does logging work with multiprocessing?

The multiprocessing module has its own logger with the name “multiprocessing“. This logger is used within objects and functions within the multiprocessing module to log messages, such as debug messages that processes are running or have shutdown. We can get this logger and use it for logging.

Is Python logging multiprocessing safe?

4. Multiprocessing with logging module — QueueHandler. Although logging module is thread-safe, it's not process-safe. If you want multiple processes to write to the same log file, then you have to manually take care of the access to your file.

Why is multiprocessing slow in Python?

The multiprocessing version is slower because it needs to reload the model in every map call because the mapped functions are assumed to be stateless. The multiprocessing version looks as follows. Note that in some cases, it is possible to achieve this using the initializer argument to multiprocessing.

How do you stop a multiprocess in Python?

Solution. The solution is simple: just use the terminate() method of multiprocess. Process .


1 Answers

Yes, I believe you're right that multiprocessing.get_logger() doesn't do process-shared locks - as you say, the docs even state this. Despite all the upvotes, it looks like the question you link to is flawed in stating that it does (to give it the benefit of doubt, it was written over a decade ago - so perhaps that was the case at one point).

Why does multiprocessing.get_logger() exist then? The docs say that it:

Returns the logger used by multiprocessing. If necessary, a new one will be created.

When first created the logger has level logging.NOTSET and no default handler. Messages sent to this logger will not by default propagate to the root logger.

i.e. by default the multiprocessing module will not produce any log output since its logger's logging level is set to NOTSET so no log messages are produced.

If you were to have a problem with your code that you suspected to be an issue with multiprocessing, that lack of log output wouldn't be helpful for debugging, and that's what multiprocessing.get_logger() exists for - it returns the logger used by the multiprocessing module itself so that you can override the default logging configuration to get some logs from it and see what it's doing.

Since you asked for how to use multiprocessing.get_logger(), you'd call it like so and configure the logger in the usual fashion, for example:

logger = multiprocessing.get_logger()
formatter = logging.Formatter('[%(levelname)s/%(processName)s] %(message)s')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

# now run your multiprocessing code

That said, you may actually want to use multiprocessing.log_to_stderr() instead for convenience - as per the docs:

This function performs a call to get_logger() but in addition to returning the logger created by get_logger, it adds a handler which sends output to sys.stderr using format '[%(levelname)s/%(processName)s] %(message)s'

i.e. it saves you needing to set up quite so much logging config yourself, and you can instead start debugging your multiprocessing issue with just:

logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)

# now run your multiprocessing code

To reiterate though, that's just a normal module logger that's being configured and used, i.e. there's nothing special or process-safe about it. It just lets you see what's happening inside the multiprocessing module itself.

like image 141
emmagordon Avatar answered Oct 06 '22 04:10

emmagordon