Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ICorProfilerCallback::ClassUnloadStarted not called for a generic class, even though the class was unloaded

I'm currently debugging my company's CLR profiler (over ASP.NET 4.7.3282.0, .NET framework 4.7.2), and seeing a scenario in which the CLR unloads a generic class, but the ClassUnloadStarted callback is not called.

In a nutshell, our profiler keeps track of loaded classes based on ClassIDs, following the ClassLoadStarted, ClassLoadFinished and ClassUnloadStarted callbacks. At some point, the class gets unloaded (along with its relevant module), but the ClassUnloadStarted callback is not called for the relevant ClassID. Therefore, we're left with a stall ClassID, thinking that the class is still loaded. Later on, when we try to query that ClassID, the CLR unsurprisingly crashes (since it now points to junk memory).

My question, considering the detailed scenario below:

  • Why is ClassUnloadStarted not called for my (generic) class?
  • Is this an expected, edge case behaviour of the CLR, or possibly a CLR/Profiling API bug?

I couldn't find any documentation or reasoning regarding this behaviour specifically, of ClassUnloadStarted not being called. No hints I could find in the CoreCLR code, too. Thanks in advance for any help!

The Detailed Scenario:

This is the class in question (IComparable(T) with T=ClassFromModuleFoo):

System/IComparable`1<ClassFromModuleFoo>

While the application runs, the issue manifests after some modules have been unloaded.
Here's the exact load/unload callbacks flow, based on debug prints added:

  1. The class System/IComparable'1(ClassFromModuleFoo), of mscorlib, is loaded.
  2. Immediately afterwards, the class ClassFromModuleFoo, of the module Foo, is loaded into assembly #1.
  3. Module Foo finishes to load into assembly #1.
  4. Then, module Foo is loaded again into a different assembly, #2.
  5. The IComparable and ClassFromModuleFoo are loaded again, this time in assembly #2. Now there are two instances of each class: one in Foo loaded in assembly #1, and one in Foo loaded in assembly #2.
  6. Module Foo begins to unload from assembly #1.
  7. ClassUnloadStarted callback is called for ClassFromModuleFoo in assembly #1.
  8. Module Foo finished to unload from assembly #1.
  9. ClassUnloadStarted is not called for System/IComparable'1(ClassFromModuleFoo) of assembly #1 anytime later (even though its module unloaded and its ClassID points to now thrashed memory).

Some additional information:

  • The issue also reproduces with the latest .NET framework version, 4.8 preview.
  • I've disabled native images by adding COR_PRF_DISABLE_ALL_NGEN_IMAGES to the profiler event mask, thinking it may impact the ClassLoad* callbacks, but it didn't made any difference. I verified that mscorlib.dll in indeed loaded instead of its native image.

Edit:

Thanks to my very smart colleague, I was able to reproduce the issue with a small example project, that simulates this scenario by loading and unloading of AppDomains. Here it is:
https://github.com/shaharv/dotnet/tree/master/testers/module-load-unload

The crash occurs for this class in the test, which is unloaded, and for which the CLR didn't call the unload callback:

Loop/MyGenList`1<System/String>

Here's the relevant code, which is loaded and unloaded a few times:

namespace Loop
{
    public class MyGenList<T>
    {
        public List<T> _tList;

        public MyGenList(List<T> tList)
        {
            _tList = tList;
        }
    }

    class MyGenericTest
    {
        public void TestFunc()
        {
            MyGenList<String> genList = new MyGenList<String>(new List<string> { "A", "B", "C" });

            try
            {
                throw new Exception();
            }
            catch (Exception)
            {

            }
        }
    }
}

At some point, the profiler crashes trying to query the ClassID of that class - thinking it's still valid, since the unload callback was not called for it.

On a side note, I tried porting this example to .NET Core for investigating further, but couldn't figure out how, since .NET Core doesn't support secondary AppDomains (and I'm not very sure it supports on-demand assembly unloading in general).

like image 689
valiano Avatar asked Feb 26 '19 11:02

valiano


1 Answers

After making it possible in .Net Core (unloading wasn't supported before 3.0), we managed to replicate it (thanks valiano!). It is confirmed to be a bug by coreclr team (https://github.com/dotnet/coreclr/issues/26126).

From davmason's explanation:

There are three separate types involved and each callback is only giving you two (but a different set of two).

Plugin.MyGenList1: the unbound generic type Plugin.MyGenList1 : the generic type bound to thecanonical type (used for normal references) Plugin.MyGenList1 : the generic type bound to System.String. For ClassLoadStarted we have logic that that specifically excludes unbound generic types (i.e. Plugin.MyGenList1) from being shown to the profiler in ClassLoader::Notify

This means you ClassLoadStarted only gives you callbacks for the canonical and string instances. This seems the right thing to do here, since as a profiler you would only care about bound generic types and there's nothing of interest for unbound ones.

The issue is that we do a different set of filtering for ClassUnloadStarted. That callback occurs inside EEClass::Destruct, and Destruct is only called on non-generic types, unbound generic types, and canonical generic types. Non-canonical generic types ( i.e. Plugin.MyGenList1 ) are skipped.

like image 119
Egozy Avatar answered Nov 11 '22 03:11

Egozy