Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the Roslyn Semantic Model to Find Symbols in a Single .cs File

I am using Roslyn to create an analyzer that warns users if a particular class exposes its fields in an unsynchronized manner, to help prevent race conditions.

The Problem:

I currently have working code that checks to make sure a field is private. I’m having trouble with the last piece of the puzzle: figuring out a way to make sure that all fields are only accessed inside a lock block, so they’re (ostensibly) synchronized.

using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;
using System.Threading;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using Microsoft.CodeAnalysis.Diagnostics;
using Microsoft.CodeAnalysis.FindSymbols;

namespace RaceConditions
{
    [DiagnosticAnalyzer(LanguageNames.CSharp)]
    public class UnsynchronizedMemberAccess : DiagnosticAnalyzer
    {
        public const string DiagnosticId = "UnsynchronizedMemberAccess";
        internal static readonly LocalizableString Title = "UnsynchronizedMemberAccess Title";
        private static readonly LocalizableString MessageFormat = "Unsychronized fields are not thread-safe";
        private static readonly LocalizableString Description = "Accessing fields without a get/set methods synchronized with each other and the constructor may lead to race conditions";
        internal const string Category = "Race Conditions";

        private static DiagnosticDescriptor Rule = new DiagnosticDescriptor(DiagnosticId, Title, MessageFormat, Category, DiagnosticSeverity.Warning, isEnabledByDefault: true, description: Description);
        public override ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics { get { return ImmutableArray.Create(Rule); } }

        //meant to stop other classes and itself from accessing members in an unsychronized fashion.
        public override void Initialize(AnalysisContext analysisContext)
        {
            analysisContext.RegisterSemanticModelAction((context) =>
            {
                var model = context.SemanticModel;
                var root = model.SyntaxTree.GetRoot();
                var nodes = model.SyntaxTree.GetRoot().DescendantNodes();

                var fields = nodes.OfType<VariableDeclaratorSyntax>()
                        .Where(v => v.Ancestors().OfType<FieldDeclarationSyntax>().Any());
                //since (it appears) that you can't read/write to a an initialized field,
                //I think it means you can only read/write inside a block
                foreach (BlockSyntax b in nodes.OfType<BlockSyntax>())
                {
                    //where I plan to put code to check references to the fields
                }
            });
        }
    }
}

More specifically, I’d like to be able to ensure everything highlighted by the reference highlighter (at least that’s what Microsoft seems to call it) is inside a lock block, while overloaded parameters do not have to.

using System;
using System.Linq;
using System.Activities;
using System.Activities.Statements;
using System.Data.SqlClient;

namespace Sandbox
{

    partial class Program
    {
        private int xe = 0, y = 0;

        public Program(int xe)
        {
            this.xe = xe;
        }

        void bleh()
        {
            if (xe == 0)
            {
                xe = xe + 1;
            }
        }

        static void Main(string[] args)
        {
            Program p0 = new Program(5),
                p1 = new Program(p0),
                p2 = new Program(p0.xe);

            Console.WriteLine(p1.xe);
            Console.Read();
        }
    }

    partial class Program
    {
        public Program(Program p) : this(p.xe) { }
    }
}

The Research:

Here, Josh Varty [1] suggests I use SymbolFinder.FindReferencesAsync, which requires a Solution object. Jason Malinowski [2] says that I shouldn’t use do this in an analyzer, since making a MSBuildWorkspace to get a Solution object is too slow, while this person [3] offers an incomplete/missing workaround to the slowness issue (the link to ReferenceResolver seems to be broken).

I have also looked into DataFlowAnalysis (SemanticModel.AnalyzeDataFlow()), but I can’t find any particular methods there that obviously let me guarantee that I’m referencing the field xe, and not the local variable xe.

The Question:

I do feel like there is something monumentally obvious I’m missing. Is there some elegant way to implement this that I’ve overlooked? It’d be preferable if the answer uses the semantic model, since I expect I have to use it in other analyzers to figure out where data/references come from, but I realize there are limitations, so any answers without the semantic model are also fine.

Notes:

  • Apparently, this issue was also encountered at Github [4], but apparently it’s still being tracked there, and they don’t know if the analyzer should analyze on the project level or not. It still hasn’t been resolved. For the purposes of this analyzer, I will assume that the entire class contained in a single .cs file. Small steps first, eh?
  • I also searched through John Koerner's website [5] and Josh Varty's website [6], and couldn’t find anything relevant to both analyzers and DataFlowAnalysis.
like image 730
logitropicity Avatar asked Jul 13 '16 22:07

logitropicity


1 Answers

The trick is to invert how you're asking the question. Go from:

How do I find all the references to this symbol that I want to ensure is synchronized?

but instead

How, upon looking at the use of a symbol, determine if this should be inside of a lock statement?

Because this offers a course of action: your analyzer should instead look at each identifier in a method body that's not in a lock statement, call SemanticModel.GetSymbolInfo(), get the symbol that's being referenced, and then check if that field is one that's "synchronized" via your logic (private, etc.). At that point, then since you're looking at the use, you can flag that particular use.

This inversion is how we expect analyzers to be written, and it's not an accident. The reason is largely performance. Imagine your analyzer is running inside Visual Studio, and you delete a line of code. If analyzers were written in the model of "look at a symbol, now ask for all uses", it means any and all analyzers that did that potentially have to rerun from scratch. That's not great for your CPU or battery life. When the question is inverted like this, it means we only have to reanalyze that specific file, since you're not expanding to "give me everything".

like image 72
Jason Malinowski Avatar answered Oct 20 '22 18:10

Jason Malinowski