When I run the following script on a Windows computer, I do not see any of the log messages from the log_pid
function, however I do when I run on Unix / Mac. I've read before that multiprocessing is different on Windows compared to Mac, but it's not clear to me what changes should I make to get this script to work on Windows. I'm running Python 3.6.
import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os
def log_pid(x):
logger.info('Executing on process: %s' % os.getpid())
def do_stuff():
logger.info('this is the do stuff function.')
with ProcessPoolExecutor(max_workers=4) as executor:
executor.map(log_pid, range(0, 10))
def main():
logger.info('this is the main function.')
do_stuff()
if __name__ == '__main__':
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.info('Start of script ...')
main()
logger.info('End of script ...')
Unix processes are created via the fork
strategy where the child gets cloned from the parent and continues its execution right at the moment the parent forked.
On Windows is quite different: a blank process is created and a new Python interpreter gets launched. The interpreter will then load the module where the log_pid
function is located and execute it.
This means the __main__
section is not executed by the newly spawned child process. Hence, the logger
object is not created and the log_pid
function crashes accordingly. You don't see the error because you ignore the result of your computation. Try to modify the logic as follows.
def do_stuff():
logger.info('this is the do stuff function.')
with ProcessPoolExecutor(max_workers=4) as executor:
iterator = executor.map(log_pid, range(0, 10))
list(iterator) # collect the results in a list
And the issue will become evident.
Traceback (most recent call last):
File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in <listcomp>
return [fn(*args) for args in chunk]
File "C:\Users\cafama\Desktop\pool.py", line 8, in log_pid
logger.info('Executing on process: %s' % os.getpid())
NameError: name 'logger' is not defined
When dealing with process pools (whether concurrent.futures
or multiprocessing
ones) always collect the result of the computation to avoid silent bugs to cause confusion.
To fix the problem, just move the logger
creation at the top level of the module and everything will work on all platforms.
import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)
def log_pid(x):
logger.info('Executing on process: %s' % os.getpid())
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With