Getting interface implementations in referenced assemblies with Roslyn

Question

I'd like to bypass some classical assembly scanning techniques in a framework I am developing.

So, say I've defined the following contract:

public interface IModule
{

}

This exists in say Contracts.dll.

Now, if I want to discover all implementations of this interface, we would probably do something similar to the following:

public IEnumerable<IModule> DiscoverModules()
{
    var contractType = typeof(IModule);
    var assemblies = AppDomain.Current.GetAssemblies() // Bad but will do
    var types = assemblies
        .SelectMany(a => a.GetExportedTypes)
        .Where(t => contractType.IsAssignableFrom(t))
        .ToList();

    return types.Select(t => Activator.CreateInstance(t));
}

Not a great example, but it will do.

Now, these sorts of assembly scanning techniques can be quite under-performaning, and its all done at runtime, typically impacting startup performance.

In the new DNX environment, we can use ICompileModule instances as metaprogramming tools, so you could bundle an implementation of ICompileModule into your Compiler\Preprocess folder in your project and get it to do something funky.

What my target would be, is to use an ICompileModule implementation, to do the work that we would do at runtime, at compile time instead.

In my references (both compilations and assemblies), and my current compilation, discover all instaniatable instances of IModule
Create a class, lets call it ModuleList with an implementation which yields instances of each module.

public static class ModuleList
{
    public static IEnumerable<IModule>() GetModules()
    {
        yield return new Module1();
        yield return new Module2();
    }
}

With that class added to the compilation unit, we could invoke it and get a static list of modules at runtime, instead of having to search through all the attached assemblies. We're essentially offloading the work on the compiler instead of the runtime.

Given we can get access to all references for a compilation via the References property, I can't see how I can get any useful information, such as maybe access to the byte code, to perhaps to load an assembly for reflection, or something like that.

Thoughts?

atlaste · Accepted Answer

Thoughts?

Yes.

Typically in a module environment you want to dynamically load a module based on the context, or - if applicable - from a third party. In contrast, using the Roslyn compiler framework, you basically get this information compile-time, thereby restricting the modules to static references.

Just yesterday I posted the code for dynamic loading of factories wth. attributes, updates for loading DLL's etc here: Naming convention for GoF Factory? . From what I understand, it's quite similar to what you're trying to achieve. The upside of that approach is that you can dynamically load new DLL's at runtime. If you try it, you'll find that it's quite fast.

You can also further restrict the assemblies you process. For example, if you don't process mscorlib and System.* (or perhaps even all GAC assemblies) it'll work a lot faster of course. Still, as I said, it shouldn't be a problem; just scanning for types and attributes is quite a fast process.

OK, a bit more information and context.

Now, it might be possible that you're just looking for a fun puzzle. I can understand that, toying around with technology is after all a lot of fun. The answer below (by Matthew himself) will give you all the information that you need.

If you want to balance the pro's and cons of compile-time code generation versus a runtime solution, here's more information from my experience.

Some years back, I decided it was a good idea to have my own C# parser/generator framework to do AST transformations. It's quite similar to what you can do with Roslyn; basically it converts an entire project into an AST tree, which you can then normalize, generate code for, do extra checks on do aspect-oriented programming stuff and add new language constructs. My original goal here was to add support for aspect oriented programming into C#, for which I had some practical applications. I'll spare you the details, but for this context it's sufficient to say that a Module / Factory based on code generation was one of the things I've experimented with as well.

Performance, flexibility and amount of code (in the non-library solution) are the key aspects for me for weighting the decision between a runtime and compile time decision. Let's break them down:

Performance. This is important because I cannot assume that the library code is not on the critical path. Runtime will cost you a few milliseconds per appdomain instance. (See below for remarks on how/why).
Flexibility. They're both about equally flexible in terms of attribute / scanning. However, at runtime you have more possibilities in terms of changing the rules (e.g. dynamically plugging modules etc). I sometimes use this, particularly based on configuration, so that I don't have to develop everything in the same solution (because that's inefficient).
Amount of code. As a rule of thumb, less code is usually better code. If you do it right, both will result in a single attribute that you need on a class. In other words, both solutions give the same result here.

A note on performance is in order though. I use reflection for more than just factory patterns in my code. I basically have an extensive library here of 'tools' that include all design patterns (and a ton of other things). A few examples: I automatically generate code at runtime for things like factories, chain-of-responsibility, decorators, mocking, caching / proxies (and much more). Some of these already required me to scan the assemblies.

As a simple rule of thumb, I always use an attribute to denote that something has to be changed. You can use this to your advantage: by simply storing every type with an attribute (of the correct assembly/namespace) in a singleton / dictionary somewhere, you can make the application a lot faster (because you only need to scan once). It's also not very useful to scan assemblies from Microsoft. I did a lot of tests on large projects, and found that in the worst case that I found, scanning added approximately 10 ms to the startup time of an application. Note that this is only once per instantiation of an appdomain, which means you won't even notice it, ever.

Activation of the types is really the only 'real' performance penalty you will get. That penalty can be optimized away by emitting the IL code; it's really not that difficult. The end result is that it won't make any difference here.

To wrap it up, here are my conclusions:

Performance: Insignificant difference.
Flexibility: Runtime wins.
Amount of code: Insignificant difference.

From my experience, although a lot of frameworks hope to support plug and play architectures which could benefit from drop in assemblies, the reality is, there isn't a whole load of use-cases where this is actually applicable.

If it's not applicable, you might want to consider not using a factory pattern in the first place. Also, if it is applicable, I've shown that there isn't a real downside to it, that is: iff you implement it properly. Unfortunately I have to acknowledge here that I've seen a lot of bad implementations.

As for the fact that it's not actually applicable, I think that's only partly true. It's quite common to drop-in data providers (it logically follows from a 3-tier architecture). I also use factories to wire up things like communication/WCF API's, caching providers and decorators (that logically follows from an n-tier architecture). Generally speaking it's used for any kind of provider you can think of.

If the argument is that it gives a performance penalty, you basically want to remove the entire type scanning process. Personally, I use that for a ton of different things, most notably caching, statistics, logging and configuration. Also, I believe the performance downside is negliable.

Just my 2 cents; HTH.

Matthew Abbott · Answer

So my approach with this challenge meant diving through a whole load of reference source to understand the different types available to Roslyn.

To prefix the end solution, lets create the module interface, we'll put this in Contracts.dll:

public interface IModule
{
    public int Order { get; }

    public string Name { get; }

    public Version Version { get; }

    IEnumerable<ServiceDescriptor> GetServices();
}

public interface IModuleProvider
{
    IEnumerable<IModule> GetModules();
}

And let's also define out base provider:

public abstract class ModuleProviderBase
{
    private readonly List<IModule> _modules = new List<IModule>();

    protected ModuleProviderBase()
    {
        Setup();
    }

    public IEnumerable<IModule> GetModules()
    {
        return _modules.OrderBy(m => m.Order);
    }

    protected void AddModule<T>() where T : IModule, new()
    {
        var module = new T();
        _modules.Add(module);
    }

    protected virtual void Setup() { }
}

Now, in this architecture, the module isn't really anything more than a descriptor, so shouldn't take dependencies, it merely expresses what services it offers.

Now an example module might look like, in DefaultLogger.dll:

public class DefaultLoggerModule : ModuleBase
{
    public override int Order { get { return ModuleOrder.Level3; } }

    public override IEnumerable<ServiceDescriptor> GetServices()
    {
        yield return ServiceDescriptor.Instance<ILoggerFactory>(new DefaultLoggerFactory());
    }
}

I've left out the implementation of ModuleBase for brevity.

Now, in my web project, I add a reference to Contracts.dll and DefaultLogger.dll, and then add the following implementation of my module provider:

public partial class ModuleProvider : ModuleProviderBase { }

And now, my ICompileModule:

using T = Microsoft.CodeAnalysis.CSharp.CSharpSyntaxTree;
using F = Microsoft.CodeAnalysis.CSharp.SyntaxFactory;
using K = Microsoft.CodeAnalysis.CSharp.SyntaxKind;

public class DiscoverModulesCompileModule : ICompileModule
{
    private static MethodInfo GetMetadataMethodInfo = typeof(PortableExecutableReference)
        .GetMethod("GetMetadata", BindingFlags.NonPublic | BindingFlags.Instance);
    private static FieldInfo CachedSymbolsFieldInfo = typeof(AssemblyMetadata)
        .GetField("CachedSymbols", BindingFlags.NonPublic | BindingFlags.Instance);
    private ConcurrentDictionary<MetadataReference, string[]> _cache
        = new ConcurrentDictionary<MetadataReference, string[]>();

    public void AfterCompile(IAfterCompileContext context) { }

    public void BeforeCompile(IBeforeCompileContext context)
    {
        // Firstly, I need to resolve the namespace of the ModuleProvider instance in this current compilation.
        string ns = GetModuleProviderNamespace(context.Compilation.SyntaxTrees);

        // Next, get all the available modules in assembly and compilation references.
        var modules = GetAvailableModules(context.Compilation).ToList();
        // Map them to a collection of statements
        var statements = modules.Select(m => F.ParseStatement("AddModule<" + module + ">();")).ToList();

        // Now, I'll create the dynamic implementation as a private class.
        var cu = F.CompilationUnit()
            .AddMembers(
                F.NamespaceDeclaration(F.IdentifierName(ns))
                    .AddMembers(
                        F.ClassDeclaration("ModuleProvider")
                            .WithModifiers(F.TokenList(F.Token(K.PartialKeyword)))
                            .AddMembers(
                                F.MethodDeclaration(F.PredefinedType(F.Token(K.VoidKeyword)), "Setup")
                                    .WithModifiers(
                                        F.TokenList(
                                            F.Token(K.ProtectedKeyword), 
                                            F.Token(K.OverrideKeyword)))
                                    .WithBody(F.Block(statements))
                            )
                    )
            )
            .NormalizeWhitespace(indentation("	"));

        var tree = T.Create(cu);
        context.Compilation = context.Compilation.AddSyntaxTrees(tree);
    }

    // Rest of implementation, described below
}

Essentially this module does a few steps;

1 - Resolves the namespace of the ModuleProvider instance in the web project, e.g. SampleWeb.
2 - Discovers all the available modules through references, these are returned as a collection of strings, e.g. new[] { "SampleLogger.DefaultLoggerModule" }
3 - Convert those to statements of the kind AddModule<SampleLogger.DefaultLoggerModule>();
4 - Create a partial implementation of ModuleProvider that we are adding to our compilation:

namespace SampleWeb
{
    partial class ModuleProvider
    {
        protected override void Setup()
        {
            AddModule<SampleLogger.DefaultLoggerModule>();
        }
    }
}

So, how did I discover the available modules? There are three phases:

1 - The referenced assemblies (e.g., those provided through NuGet)
2 - The referenced compilations (e.g., the referenced projects in the solution).
3 - The module declarations in the current compilation.

And for each referenced compilation, we repeat the above.

private IEnumerable<string> GetAvailableModules(Compilation compilation)
{
    var list = new List<string>();
    string[] modules = null;

    // Get the available references.
    var refs = compilation.References.ToList();

    // Get the assembly references.
    var assemblies = refs.OfType<PortableExecutableReference>().ToList();
    foreach (var assemblyRef in assemblies)
    {
        if (!_cache.TryGetValue(assemblyRef, out modules))
        {
            modules = GetAssemblyModules(assemblyRef);
            _cache.AddOrUpdate(assemblyRef, modules, (k, v) => modules);
            list.AddRange(modules);
        }
        else
        {
            // We've already included this assembly.
        }
    }

    // Get the compilation references
    var compilations = refs.OfType<CompilationReference>().ToList();
    foreach (var compliationRef in compilations)
    {
        if (!_cache.TryGetValue(compilationRef, out modules))
        {
            modules = GetAvailableModules(compilationRef.Compilation).ToArray();
            _cache.AddOrUpdate(compilationRef, modules, (k, v) => modules);
            list.AddRange(modules);
        }
        else
        {
            // We've already included this compilation.
        }
    }

    // Finally, deal with modules in the current compilation.
    list.AddRange(GetModuleClassDeclarations(compilation));

    return list;
}

So, to get assembly referenced modules:

private IEnumerable<string> GetAssemblyModules(PortableExecutableReference reference)
{
    var metadata = GetMetadataMethodInfo.Invoke(reference, nul) as AssemblyMetadata;
    if (metadata != null)
    {
        var assemblySymbol = ((IEnumerable<IAssemblySymbol>)CachedSymbolsFieldInfo.GetValue(metadata)).First();

        // Only consider our assemblies? Sample*?
        if (assemblySymbol.Name.StartsWith("Sample"))
        {
            var types = GetTypeSymbols(assemblySymbol.GlobalNamespace).Where(t => Filter(t));
            return types.Select(t => GetFullMetadataName(t)).ToArray();
        }
    }

    return Enumerable.Empty<string>();
}

We need to do a little reflection here as the GetMetadata method is not public, and later, when we grab the metadata, the CachedSymbols field is also non-public, so more reflection there. In terms of identifying what is available, we need to grab the IEnumerable<IAssemblySymbol> from the CachedSymbols property. This gives us all the cached symbols in the reference assembly. Roslyn does this for us, so we can then abuse it:

private IEnumerable<ITypeSymbol> GetTypeSymbols(INamespaceSymbol ns)
{
    foreach (var typeSymbols in ns.GetTypeMembers().Where(t => !t.Name.StartsWith("<")))
    {
        yield return typeSymbol;
    }

    foreach (var namespaceSymbol in ns.GetNamespaceMembers())
    {
        foreach (var typeSymbol in GetTypeSymbols(ns))
        {
            yield return typeSymbol;
        }
    }
}

The GetTypeSymbols method walks through the namespaces and discovers all types. We then chain the result to the filter method, which ensures it implements our required interface:

private bool Filter(ITypeSymbol symbol)
{
    return symbol.IsReferenceType 
        && !symbol.IsAbstract
        && !symbol.IsAnonymousType
        && symbol.AllInterfaces.Any(i => i.GetFullMetadataName(i) == "Sample.IModule");
}

With GetFullMetadataName being a utility method:

private static string GetFullMetadataName(INamespaceOrTypeSymbol symbol)
{
    ISymbol s = symbol;
    var builder = new StringBuilder(s.MetadataName);
    var last = s;
    while (!!IsRootNamespace(s))
    {
        builder.Insert(0, '.');
        builder.Insert(0, s.MetadataName);
        s = s.ContainingSymbol;
    }

    return builder.ToString();
}

private static bool IsRootNamespace(ISymbol symbol)
{
    return symbol is INamespaceSymbol && ((INamespaceSymbol)symbol).IsGlobalNamespace;
}

Next up, module declarations in the current compilation:

private IEnumerable<string> GetModuleClassDeclarations(Compilation compilation)
{
    var trees = compilation.SyntaxTrees.ToArray();
    var models = trees.Select(compilation.GetSemanticModel(t)).ToArray();

    for (var i = 0; i < trees.Length; i++)
    {
        var tree = trees[i];
        var model = models[i];

        var types = tree.GetRoot().DescendantNodes().OfType<ClassDeclarationSyntax>().ToList();
        foreach (var type in types)
        {
            var symbol = model.GetDeclaredSymbol(type) as ITypeSymbol;
            if (symbol != null && Filter(symbol))
            {
                yield return GetFullMetadataName(symbol);
            }
        }
    }
}

And that's really it! So, now at compile time, my ICompileModule will:

Discover all available modules
Implement an override of my ModuleProvider.Setup method with all known referenced modules.

This means I can add my startup:

public class Startup
{
    public ModuleProvider ModuleProvider = new ModuleProvider();

    public void ConfigureServices(IServiceCollection services)
    {
        var descriptors = ModuleProvider.GetModules() // Ordered
            .SelectMany(m => m.GetServices());

        // Apply descriptors to services.
    }

    public void Configure(IApplicationBuilder app)
    {
        var modules = ModuleProvider.GetModules(); // Ordered.

        // Startup code.
    }
}

Massively over-engineered, quite complex, but kinda awesome I think!

Getting interface implementations in referenced assemblies with Roslyn

Tags:

c#

roslyn

metaprogramming

dnx

Matthew Abbott

2 Answers

atlaste

Matthew Abbott

Recent Activity

Donate For Us

Getting interface implementations in referenced assemblies with Roslyn

Tags:

c#

roslyn

metaprogramming

dnx

Matthew Abbott

2 Answers

atlaste

Matthew Abbott

Related questions

Recent Activity

Donate For Us