How can I reliably determine the type of a variable that is declared using var at design time?

I can describe for you how we do that efficiently in the "real" C# IDE.

The first thing we do is run a pass which analyzes only the "top level" stuff in the source code. We skip all the method bodies. That allows us to quickly build up a database of information about what namespace, types and methods (and constructors, etc) are in the source code of the program. Analyzing every single line of code in every method body would take way too long if you're trying to do it between keystrokes.

When the IDE needs to work out the type of a particular expression inside a method body -- say you've typed "foo." and we need to figure out what are the members of foo -- we do the same thing; we skip as much work as we reasonably can.

We start with a pass which analyzes only the local variable declarations within that method. When we run that pass we make a mapping from a pair of "scope" and "name" to a "type determiner". The "type determiner" is an object that represents the notion of "I can work out the type of this local if I need to". Working out the type of a local can be expensive so we want to defer that work if we need to.

We now have a lazily-built database that can tell us the type of every local. So, getting back to that "foo." -- we figure out which statement the relevant expression is in and then run the semantic analyzer against just that statement. For example, suppose you have the method body:

String x = "hello";
var y = x.ToCharArray();
var z = from foo in y where foo.

and now we need to work out that foo is of type char. We build a database that has all the metadata, extension methods, source code types, and so on. We build a database that has type determiners for x, y and z. We analyze the statement containing the interesting expression. We start by transforming it syntactically to

var z = y.Where(foo=>foo.

In order to work out the type of foo we must first know the type of y. So at this point we ask the type determiner "what is the type of y"? It then starts up an expression evaluator which parses x.ToCharArray() and asks "what's the type of x"? We have a type determiner for that which says "I need to look up "String" in the current context". There is no type String in the current type, so we look in the namespace. It's not there either so we look in the using directives and discover that there's a "using System" and that System has a type String. OK, so that's the type of x.

We then query System.String's metadata for the type of ToCharArray and it says that it's a System.Char[]. Super. So we have a type for y.

Now we ask "does System.Char[] have a method Where?" No. So we look in the using directives; we have already precomputed a database containing all of the metadata for extension methods that could possibly be used.

Now we say "OK, there are eighteen dozen extension methods named Where in scope, do any of them have a first formal parameter whose type is compatible with System.Char[]?" So we start a round of convertibility testing. However, the Where extension methods are generic, which means we have to do type inference.

I've written a special type infererencing engine that can handle making incomplete inferences from the first argument to an extension method. We run the type inferrer and discover that there is a Where method that takes an IEnumerable<T>, and that we can make an inference from System.Char[] to IEnumerable<System.Char>, so T is System.Char.

The signature of this method is Where<T>(this IEnumerable<T> items, Func<T, bool> predicate), and we know that T is System.Char. Also we know that the first argument inside the parentheses to the extension method is a lambda. So we start up a lambda expression type inferrer that says "the formal parameter foo is assumed to be System.Char", use this fact when analyzing the rest of the lambda.

We now have all the information we need to analyze the body of the lambda, which is "foo.". We look up the type of foo, we discover that according to the lambda binder it is System.Char, and we're done; we display type information for System.Char.

And we do everything except the "top level" analysis between keystrokes. That's the real tricky bit. Actually writing all the analysis is not hard; it's making it fast enough that you can do it at typing speed that is the real tricky bit.

I can tell you roughly how the Delphi IDE works with the Delphi compiler to do intellisense (code insight is what Delphi calls it). It's not 100% applicable to C#, but it's an interesting approach which deserves consideration.

Most semantic analysis in Delphi is done in the parser itself. Expressions are typed as they are parsed, except for situations where this is not easy - in which case look-ahead parsing is used to work out what's intended, and then that decision is used in the parse.

The parse is largely LL(2) recursive descent, except for expressions, which are parsed using operator precedence. One of the distinct things about Delphi is that it's a single-pass language, so constructs need to be declared before being used, so no top-level pass is needed to bring that information out.

This combination of features means that the parser has roughly all the information needed for code insight for any point where it's needed. The way it works is this: the IDE informs the compiler's lexer of the position of the cursor (the point where code insight is desired) and the lexer turns this into a special token (it's called the kibitz token). Whenever the parser meets this token (which could be anywhere) it knows that this is the signal to send back all the information it has back to the editor. It does this using a longjmp because it's written in C; what it does is it notifies the ultimate caller of the kind of syntactic construct (i.e. grammatical context) the kibitz point was found in, as well as all the symbolic tables necessary for that point. So for example, if the context is in an expression which is an argument to a method, the we can check the method overloads, look at the argument types, and filter the valid symbols to only those which can resolve to that argument type (this cuts down in a lot of irrelevant cruft in the drop-down). If it's in a nested scope context (e.g. after a "."), the parser will have handed back a reference to the scope, and the IDE can enumerate all the symbols found in that scope.

Other things are also done; for example, method bodies are skipped if the kibitz token does not lie in their range - this is done optimistically, and rolled back if it skipped over the token. The equivalent of extension methods - class helpers in Delphi - have a kind of versioned cache, so their lookup is reasonably fast. But Delphi's generic type inference is much weaker than C#'s.

Now, to the specific question: inferring the types of variables declared with var is equivalent to the way Pascal infers the type of constants. It comes from the type of the initialization expression. These types are built from the bottom up. If x is of type Integer, and y is of type Double, then x + y will be of type Double, because those are the rules of the language; etc. You follow these rules until you have a type for the full expression on the right hand side, and that's the type you use for the symbol on the left.

If you don't want to have to write your own parser to build the abstract syntax tree, you could look at using the parsers from either SharpDevelop or MonoDevelop, both of which are open source.

Intellisense systems typically represent the code using an Abstract Syntax Tree, which allows them to resolve the return type of the function being assigned to the 'var' variable in more or less the same way as the compiler will. If you use the VS Intellisense, you may notice that it won't give you the type of var until you've finished entering a valid (resolvable) assignment expression. If the expression is still ambiguous (for instance, it can't fully infer the generic arguments for the expression), the var type will not resolve. This can be a fairly complex process, as you might need to walk fairly deep into a tree in order to resolve the type. For instance:

var items = myList.OfType<Foo>().Select(foo => foo.Bar);

The return type is IEnumerable<Bar>, but resolving this required knowing:

  1. myList is of type that implements IEnumerable.
  2. There is an extension method OfType<T> that applies to IEnumerable.
  3. The resulting value is IEnumerable<Foo> and there is an extension method Select that applies to this.
  4. The lambda expression foo => foo.Bar has the parameter foo of type Foo. This is inferred by the usage of Select, which takes a Func<TIn,TOut> and since TIn is known (Foo), the type of foo can be inferred.
  5. The type Foo has a property Bar, which is of type Bar. We know that Select returns IEnumerable<TOut> and TOut can be inferred from the result of the lambda expression, so the resulting type of items must be IEnumerable<Bar>.