Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread.Join method does not always return the same value when the thread has already terminated (.NET 5 / Core)

The Thread.Join method has three overloads: Join(), Join(Int32) and Join(TimeSpan). For each of these three overloads, there is the following statement in the Microsoft doc:

If the thread has already terminated when Join is called, the method returns immediately.

While this statement makes sense for the Join() overload, it doesn't specify which value is returned for the Join(Int32) and Join(TimeSpan) ones, so I tested the Int32 overload in two different environments:

  1. Windows 10: return true
  2. Linux/Docker: return false (using mcr.microsoft.com/dotnet/runtime:5.0 on Docker Desktop)

Note that the Linux/Docker implementation is returning true (like the Windows one) if the thread is still running when Join is called and has terminated after the call. It only returns false if the thread has terminated before the call.

In my opinion Join should always return true whatever the platform, so what could explain this inconsistent behavior? Am I missing something or is it a .NET 5 bug?

UPDATE

As suggested by @txtechhelp, here is a .NET Fiddle with the exact code I'm testing.

If I run this code on Windows 10 (or in .NET Fiddle) I get the following result:

Starting..
Sleeping 1200..expect T1 end before join
In T1
Leaving T1
Join(100)..expect success
Join(100) success!
Done..

Then if I run this code using mcr.microsoft.com/dotnet/runtime:5.0 on Docker Desktop (v. 3.1.0) then I get the following result:

Starting..
Sleeping 1200..expect T1 end before join
In T1
Leaving T1
Join(100)..expect success
Join(100) failed
Done..

UPDATE 2

Actually after further testing I realized that the test above is only failing if I call the Join when the Docker application is unloading (i.e. after receiving the AssemblyLoadContext.Default.Unloading event, which is the signal sent by Docker to inform that it's going to shutdown the application).

So here is the exact test which is even failing on .NET Fiddle:

public class Program
{
    public static void Main()
    {
        System.Runtime.Loader.AssemblyLoadContext.Default.Unloading += (arg) => { OnStopSignalReceived("application unloading"); };
    }

    public static void T1()
    {
        System.Console.WriteLine("In T1");
        System.Threading.Thread.Sleep(1000);
        System.Console.WriteLine("Leaving T1");
    }

    private static void OnStopSignalReceived(string stopSignalSource)
    {
        System.Threading.Thread t1 = new System.Threading.Thread(T1);
        System.Console.WriteLine("Starting..");
        t1.Start();
        System.Console.WriteLine("Sleeping 1200..expect T1 end before join");
        System.Threading.Thread.Sleep(1200);
        System.Console.WriteLine("Join(100)..expect success");
        if (t1.Join(100))
        {
            System.Console.WriteLine("Join(100) success!");
        }
        else
        {
            System.Console.WriteLine("Join(100) failed");
        }
        t1.Join();
        System.Console.WriteLine("Done..");
    }
}
like image 884
RemiGaudin Avatar asked Feb 22 '21 12:02

RemiGaudin


People also ask

Does join terminate a thread?

join() does not do anything to thread t . The only thing it does is wait for thread t to terminate.

What does the thread join () method do?

Join is a synchronization method that blocks the calling thread (that is, the thread that calls the method) until the thread whose Join method is called has completed. Use this method to ensure that a thread has been terminated. The caller will block indefinitely if the thread does not terminate.

What does thread join Do C#?

In C#, Thread class provides the Join() method which allows one thread to wait until another thread completes its execution. If t is a Thread object whose thread is currently executing, then t. Join() causes the current thread to pause its execution until thread it joins completes its execution.

What will happen if you don't use the Join or synchronized method in thread?

join() creates a happens-before relationship: However, if we do not invoke join() or use other synchronization mechanisms, we do not have any guarantee that changes in the other thread will be visible to the current thread even if the other thread has completed.


1 Answers

This issue does appear to be a bug in the underlying .NET 5 code for the AssemblyLoadContext class or possibly some undefined behavior as not yet specified by the documentation; to wit, the documentation for the AssemblyLoadContext.Unloading event merely states:

Occurs when the AssemblyLoadContext is unloaded.

That is the only sentence and doesn't provide much context given the issue you're experiencing.

That being said, after some digging around, I wrote 2 versions of the code you supplied and found some interesting behavior dealing with the AssemblyLoadContext.Unloading and threads.

This code reproduces the bug you mention:

buggy.cs

using System;
using System.Diagnostics;
using System.Threading;
using System.Runtime.Loader;

public class Program
{
        static Thread t1 = new Thread(ThreadFn);
        static Stopwatch sw = new Stopwatch();
    
        public static void Main()
        {
            AssemblyLoadContext.Default.Unloading += ContextUnloading;
            sw.Start();
            Console.WriteLine("{0}ms: leaving main", sw.ElapsedMilliseconds);
        }
    
        public static void ThreadFn()
        {
            Console.WriteLine("{0}ms: in ThreadFn, sleeping 1s", sw.ElapsedMilliseconds);
            Thread.Sleep(1000);
            Console.WriteLine("{0}ms: leaving ThreadFn", sw.ElapsedMilliseconds);
        }

        private static void ContextUnloading(AssemblyLoadContext context)
        {
            Console.WriteLine("{0}ms: unloading '{1}', thread state '{2}'", sw.ElapsedMilliseconds, context, t1.ThreadState);
            
            // possible bug/UB with t1.Start() in this function
            Console.WriteLine("{0}ms: starting thread", sw.ElapsedMilliseconds);
            t1.Start();

            Console.WriteLine("{0}ms: calling Sleep(1200); expect thread in state '{1}' to end before join called", sw.ElapsedMilliseconds, t1.ThreadState);
            Thread.Sleep(1200);
            Console.WriteLine("{0}ms: calling Join(100) on thread in state '{1}'; expect 'succeeded!'", sw.ElapsedMilliseconds, t1.ThreadState);
            Console.WriteLine("Join(100) {0}", (t1.Join(100) ? "succeeded!" : "failed"));
            Console.WriteLine("{0}ms: done", sw.ElapsedMilliseconds);
        }
}

Run it in dotnetfiddle

Running that code gives me the following results:

0ms: leaving main
12ms: unloading '"Default" System.Runtime.Loader.DefaultAssemblyLoadContext #0', thread state 'Unstarted'
13ms: starting thread
14ms: calling Sleep(1200); expect thread in state 'Running' to end before join called
14ms: in ThreadFn, sleeping 1s
1014ms: leaving ThreadFn
1214ms: calling Join(100) on thread in state 'Stopped'; expect 'succeeded!'
Join(100) failed
1214ms: done

You'll notice in this buggy version, the thread state at each point, and the time lapse, matches with the code presented. The bug happens when the Join(Int32) method is called; even though the documentation states the return value is a Boolean where the value is:

true if the thread has terminated; false if the thread has not terminated after the amount of time specified by the millisecondsTimeout parameter has elapsed.

And given the thread was Stopped, which according to the ThreadState documentation, means that the thread either responded to an Abort call (which is not called in the above code), or if

A thread is terminated.

And reading the document Understanding System.Runtime.Loader.AssemblyLoadContext, they make note to even

Be aware of thread races. Loading can be triggered by multiple threads. The AssemblyLoadContext handles thread races by atomically adding assemblies to its cache. The race loser's instance is discarded. In your implementation logic, don't add extra logic that doesn't handle multiple threads properly.

Combine all of that, and we would assume calling Join(Int32) should give the expected result of true in the above code.

So, yes, it would appear to be a bug.

However

If you move the thread start in to the Main function, instead of in the unload event handler, the AssemblyLoadContext.Unloading on the Default context does not get called until the thread has finished, and calling Join(Int32) then, of course, returns the expected result.

It would make sense that the Unloading event would not be called until after the thread has completed, since it could be considered a "part" of the current assembly context, but it does not explain why the bug in the code above still happens.

So while the Join(100) call does succeed as expected in the code below, it would appear it's because the AssemblyLoadContext.Unloading does not get called after Main exits as one might expect, instead it is called after the thread has finished, which makes contextual sense but is not necessarily noted in any of the documentation.

The 'successful' code:

syncbug.cs

using System;
using System.Diagnostics;
using System.Threading;
using System.Runtime.Loader;

public class Program
{
        static Thread t1 = new Thread(ThreadFn);
        static Stopwatch sw = new Stopwatch();
    
        public static void Main()
        {
            AssemblyLoadContext.Default.Unloading += ContextUnloading;
            sw.Start();

            // Get expected result starting thread, but Unloading isn't called until AFTER the thread
            // finishes, which is not the expected result according to the .NET documentation
            Console.WriteLine("{0}ms: starting thread", sw.ElapsedMilliseconds);
            t1.Start();
            
            Console.WriteLine("{0}ms: leaving main", sw.ElapsedMilliseconds);
        }
    
        public static void ThreadFn()
        {
            Console.WriteLine("{0}ms: in ThreadFn, sleeping 1s", sw.ElapsedMilliseconds);
            Thread.Sleep(1000);
            Console.WriteLine("{0}ms: leaving ThreadFn", sw.ElapsedMilliseconds);
        }

        private static void ContextUnloading(AssemblyLoadContext context)
        {
            Console.WriteLine("{0}ms: unloading '{1}', thread state '{2}'", sw.ElapsedMilliseconds, context, t1.ThreadState);
            Console.WriteLine("{0}ms: calling Sleep(1200); expect thread in state '{1}' to end before join called", sw.ElapsedMilliseconds, t1.ThreadState);
            Thread.Sleep(1200);
            Console.WriteLine("{0}ms: calling Join(100) on thread in state '{1}'; expect 'succeeded!'", sw.ElapsedMilliseconds, t1.ThreadState);
            Console.WriteLine("Join(100) {0}", (t1.Join(100) ? "succeeded!" : "failed"));
            Console.WriteLine("{0}ms: done", sw.ElapsedMilliseconds);
        }
}

Run it in dotnetfiddle

Running that code gives me the following results:

0ms: starting thread
11ms: leaving main
12ms: in ThreadFn, sleeping 1s
1012ms: leaving ThreadFn
1013ms: unloading '"Default" System.Runtime.Loader.DefaultAssemblyLoadContext #0', thread state 'Stopped'
1014ms: calling Sleep(1200); expect thread in state 'Stopped' to end before join called
2214ms: calling Join(100) on thread in state 'Stopped'; expect 'succeeded!'
Join(100) succeeded!
2215ms: done

You'll note the Unload event is not called until after the thread has completed.

As of the writing of this answer, there are 82 open bugs for the AssemblyLoadContext class and 346 closed. So it's possible your issues is already noted in some way, but a cursory search didn't result anything that could relate to your issue.

Since this seems to be a legit bug, and since you have more insight to your code and what's happening, I'd recommend going to their Issues page and filing a New issue.

like image 117
txtechhelp Avatar answered Oct 30 '22 09:10

txtechhelp