Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why changing start method to 'spawn' from 'fork' in Python multiprocessing does not allow me run my job anymore?

I am able to run a background function using multiprocessing.Process with the start method fork. For some reason, I need this child process to start a new environment when running. So I set the start method to spawn via multiprocessing.set_start_method('spawn') and run the job via job.start() I get the following error:

Can't pickle <class 'module'>: attribute lookup module on builtins failed

However, I do not use pickle for anything within the function that I am calling. What could I be doing wrong? Is there a general rule of thumb that I should have followed when running processes in spawn mode?

FYI: I am on a machine with Ubuntu 16.04

like image 565
Amir Avatar asked Apr 03 '18 00:04

Amir


People also ask

Is the multiprocessing default start method changed from fork to spawn?

The multiprocessing default start method changed from fork to spawn in Python 3.8: see bpo-33725. The C library provides getaddrinfo () and gethostbyname () functions which are not thread-safe on some platforms.

Should “spawn” be the default start method in Python?

There is a bug report on Python.org that suggests making “spawn” the default start method ( multiprocessing’s default start method of fork ()-without-exec () is broken ). It may be worth checking back there to see if things change in future. Below is a script to demonstrate some differences between fork and spawn .

How do I spawn a process in Python without Fork?

On Windows, spawning a process can be done using CreateProcess () which doesn’t use fork. On Unix, posix_spawn () can avoid fork on some platforms, and handles the dirty work for us on other platforms which implement it in userland. Python 3.8 provides the os.posix_spawn () function.

How to start a process in multiprocessing in Python?

Depending on the platform, multiprocessing supports three ways to start a process. These start methods are The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object%u2019s run () method.


1 Answers

Is there a general rule of thumb...

Yes. You ran into this documented restriction:

https://docs.python.org/3/library/multiprocessing.html

There are a few extra restriction which don’t apply to the fork start method.

More picklability

Ensure that all arguments to Process.init() are picklable. Also, if you subclass Process then make sure that instances will be picklable when the Process.start method is called.

You are running on ubuntu, so fork is probably the right answer. If there is a requirement you need to address which fork is incompatible with, then you will want to clearly document the details as the first part of choosing an improved solution.

like image 79
J_H Avatar answered Sep 29 '22 05:09

J_H