In macOS High Sierra (Version 10.13.6), I run a Python program that does the following:
multiprocessing.Queue
.requests
package, i.e., it makes requests.get()
calls.A program satisfying the above conditions leads to the worker process crashing with this error:
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
I have read the following threads:
These threads focus on a workaround for the user. The workaround is defining this environment variable:
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
In this question, I would like to understand why only certain conditions reproduce the error whereas other conditions do not and how to resolve this issue without putting the burden of defining the environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
on the user.
import multiprocessing as mp
import requests
def worker(q):
print('worker: starting ...')
while True:
url = q.get()
if url is None:
print('worker: exiting ...')
break
print('worker: fetching', url)
response = requests.get(url)
print('worker: response:', response.status_code)
def master():
q = mp.Queue()
p = mp.Process(target=worker, args=(q,))
q.put('https://www.example.com/')
p.start()
print('master: started worker')
q.put('https://www.example.org/')
q.put('https://www.example.net/')
q.put(None)
print('master: sent data')
print('master: waiting for worker to exit')
p.join()
print('master: exiting ...')
master()
Here is the output with the error:
$ python3 foo.py
master: started worker
master: sent data
master: waiting for worker to exit
worker: starting ...
worker: fetching https://www.example.com/
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[24250]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
master: exiting ...
Here are a few independent things I have seen that resolve the issue, i.e., performing only one of these resolves the issue:
The issue seems to occur only on using the requests
package. If we comment out these two lines in worker()
, it resolves the issue.
# response = requests.get(url)
# print('worker: response:', response.status_code)
The issue seems to occur only if q.put('https://www.example.com/')
statement occurs before the p.start()
statement. If we move that statement ater p.start()
, that resolves the issue.
p.start()
print('master: started worker')
q.put('https://www.example.com/')
Setting the environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
resolves the issue.
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python3 foo.py
Now, I do not want my users to set a variable name like this to be able to use my tool or API, so I was trying to figure if setting this environment variable within my program could resolve the issue. I found that that adding this to my code does not resolve the issue:
import os
os.environ['OBJC_DISABLE_INITIALIZE_FORK_SAFETY'] = 'YES'
# Does not resolve the issue!
Why exactly does this issue occur only under the given conditions, i.e., requests.get()
and q.put()
before p.start()
? In other words, why does the issue disappear if one of these conditions are not met?
If we were to expose something like the minimal example as an API function that another developer might call from their code, is there any clever way to resolve this issue in our code, so that the other developer does not have to set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
in their shell before running their program that uses our function?
Of course, a possible solution is to redesign the solution such that we don't have to feed data into the queue before the worker process starts. That's definitely a possible solution. The scope of this question though is to discuss why this issue occurs only when we feed data into the queue before the worker process starts.
Great question description! You've got my upvote.
Now for the answer:
fork()
and exec()
in the child process of a multithreaded parent process. You could just not call any objective-C method in that interval. This lead to race condition. i.e. most of the time it would work and sometimes it would fail. eg: If a thread in the parent process happened to be holding one of the Object-C runtime's locks when the fork()
occurred, the child process' would deadlock when it tries to take that lock.fork()
and exec()
. However, there are restrictions involving the +initialize
methods. (You're problem is in this zone).Now, before proposing the solution. Let me throw some light around complexity associated with fork
:
fork
creates a copy of the process.execve()
system callSo far everything seems OK right? The child process (worker
in your case) has a copy of parent's process and this copy is provided to child by fork()
. But, fork()
doesn't copy everything! In particular, it doesn't copy threads. Any threads running in the parent process do not exist in child process
On that note, focusing on your problem:
Although, macOS 10.13+ supports doing 'anything' between fork
and exec
. However, it is very much incorrect to do anything between fork
and exec
. In your case, calling q.put()
before p.start()
as rightly mentioned by @Darkonaut starts a feeder-thread when called first time and forking an already multithreaded application is problematic.
This is because +initialize
methods still have restrictions around fork()
.The problem is that the thread-safety guarantees of +initialize
implicitly introduce locks around state that the Objective-C runtime does not control.
When you call q.put()
or use requests
library (call into the popular requests library, this will end up calling into the _scproxy module to get the system proxies, and this will end up calling a +initialize method) before p.start()
, either of them lead your parent process acquiring a lock. You must take note that fork
creates a copy of process. In your case, when q.put()
is called before p.start()
, fork
happens at the wrong time, and you're workers
who get a copy of parent's process, get lock
in copied state.
In you're worker
, you are doing a q.get()
. This means acquiring the lock, but the lock is already acquired during fork
(from parent).
The child process (worker
) waits for the lock
to be released but the lock
would never be released. Because, the thread that would release it wasn't copied over by fork()
.
There is no good way to make +initialize
both thread-safe and fork-safe. Instead the Objective-C runtime simply halts the process instead of running any +initialize
override in the child process:
+[SomeClass initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead.
Hope that answers your Question 1.
Now, for Question 2:
A few workarounds from best to worse:
fork()
and exec()
(better not use requests between fork()
and exec*()
). I encountered this same issue on macOS Catalina. I tried to dig deeper requests-library and reason seems to be cryptography-library. Upgrading it solved all problems.
pip install cryptography --upgrade # Version 2.8 worked for me.
I had version 2.7 which was producing these objc-errors. Apparently somewhere in that library causes a fork on load and mechanism has been changed in newer version.
I think it's caused by "proxy lookup" mechanism, or some other mac-specific implementation of urllib3 (used internally by python-requests) that causes a fork. Check github for more info.
Write your function in a such way that it requires "objects that may cause forking on init" as one of the arguments. For example your worker might require a session argument:
def worker(q, session):
...
while True:
...
response = session.get(url)
print('worker: response:', response.status_code)
def master():
with requests.Session() as session: # Or use `session.close()` at the end if you don't like context-manager
q = mp.Queue()
p = mp.Process(target=worker, args=(q, session))
q.put('https://www.example.com/')
p.start()
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With