Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in Python thread.join() between Python 3.7 and 3.8

I have a small Python program that behaves differently in Python 3.7 and Python 3.8. I'm struggling to understand why. The #threading changelog for Python 3.8 does not explain this.

Here's the code:

import time
from threading import Event, Thread


class StoppableWorker(Thread):
    def __init__(self):
        super(StoppableWorker, self).__init__()
        self.daemon = False
        self._stop_event = Event()
    

    def join(self, *args, **kwargs):
        self._stop_event.set()
        print("join called")
        super(StoppableWorker, self).join(*args, **kwargs)

    def run(self):
        while not self._stop_event.is_set():
            time.sleep(1)
            print("hi")

if __name__ == "__main__":
    t = StoppableWorker()
    t.start()
    print("main done.")

When I run this in Python 3.7.3 (Debian Buster), I see the following output:

python test.py 
main done.
join called
hi

The program exits on its own. I don't know why join() is called. From the daemon documentation of 3.7:

The entire Python program exits when no alive non-daemon threads are left.

But clearly the thread should be still alive.

When I run this in Python 3.8.6 (Arch), I get the expected behavior. That is, the program keeps running:

python test.py
main done.
hi
hi
hi
hi
...

The daemon documentation for 3.8 states the same as 3.7: The program should not exit unless all non-daemon threads have joined.

Can someone help me understand what's going on, please?

like image 792
Felix Avatar asked Nov 19 '20 17:11

Felix


People also ask

What is the purpose of join() in a Python thread?

The purpose of join is to wait for the thread to end – not to signal to the thread that it should end. Which is exactly how Python uses it. There is an undocumented change in the behavior of threading _shutdown () from Python version 3.7.3 to 3.7.4.

What is thread in Python 3?

In Python 3, thread has been renamed to _thread. It is infrastructure code that is used to implement threading, and normal Python code shouldn't be going anywhere near it. _thread exposes a fairly raw view of the underlying OS level processes.

What happens when you join a thread in Java?

The global variables (stored in the heap) and the program codes are shared among all the threads. On invoking the join () method, the calling thread gets blocked until the thread object (on which the thread is called) gets terminated. The thread objects can terminate under any one of the following conditions: Either normally.

What is the difference between threading and threading?

If I'm not mistaken, thread allows you to run a function as a separate thread, whereas with threading you have to create a class, but get more functionality. EDIT: This is not precisely correct. threading module provides different ways of creating a thread: Show activity on this post.


2 Answers

There is an undocumented change in the behavior of threading _shutdown() from Python version 3.7.3 to 3.7.4.

Here's how I found it:

To trace the issue, I first used the inspect package to find out who join()s the thread in the Python 3.7.3 runtime. I modified the join() function to get some output:

...
    def join(self, *args, **kwargs):
        self._stop_event.set()
        c = threading.current_thread()
        print(f"join called from thread {c}")
        print(f"calling function: {inspect.stack()[1][3]}")
        super(StoppableWorker, self).join(*args, **kwargs)
...

When executing with Python 3.7.3, this prints:

main done.
join called from thread <_MainThread(MainThread, stopped 139660844881728)>
calling function: _shutdown
hi

So the MainThread, which is already stopped, invokes the join() method. The function responsible in the MainThread is _shutdown().

From the CPython source for Python 3.7.3 for _shutdown(), lines 1279-1282:

    t = _pickSomeNonDaemonThread()
    while t:
        t.join()
        t = _pickSomeNonDaemonThread()

That code invokes join() on all non-daemon threads when the MainThread exits!

That implementation was changed in Python 3.7.4.

To verify these findings I built Python 3.7.4 from source. It indeed behaves differently. It keeps the thread running as expected and the join() function is not invoked.

This is apparently not documented in the release notes of Python 3.7.4 nor in the changelog of Python 3.8.

-- EDIT:

As pointed out in the comments by MisterMiyagi, one might argue that extending the join() function and using it for signaling termination is not a proper use of join(). IMHO that is up to taste. It should, however, be documented that in Python 3.7.3 and before, join() is invoked by the Python runtime on system exit, while with the change to 3.7.4 this is no longer the case. If properly documented, it would explain this behavior from the get-go.

like image 196
Felix Avatar answered Oct 18 '22 20:10

Felix


What's New only lists new features. This changes looks to me like a bug fix. https://docs.python.org/3.7/whatsnew/3.7.html has a changelog link near the top. Given the research in @Felix's answer, we should look at bugfixes released in 3.7.4. https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-4-release-candidate-1

This might be the issue: https://bugs.python.org/issue36402 bpo-36402: Fix a race condition at Python shutdown when waiting for threads. Wait until the Python thread state of all non-daemon threads get deleted (join all non-daemon threads), rather than just wait until non-daemon Python threads complete.

like image 29
Terry Jan Reedy Avatar answered Oct 18 '22 22:10

Terry Jan Reedy