Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested generic syntax ambiguity >>

Apparently, C# is as susceptible to '>>' lexer dilemma as is C++.

This C# code is pretty valid, it compiles and runs just fine:

var List = new Dummy("List");
var Nullable = new Dummy("Nullable");
var Guid = new Dummy("Guid");

var x = List<Nullable<Guid>> 10;
var y =  List<Nullable<Guid>> .Equals(10,20);

You'd have to overload '<' and '>>' operators for the Dummy class above.

But the compiler manages to guess that in 'x' case the meaning is to use List, Nullable and Guid local variables. And in 'y' case it suddenly decides to treat them as names of well-known types.

Here's a bit more detailed description with another example: http://mihailik.blogspot.co.uk/2012/05/nested-generics-c-can-be-stinky.html

The question is: how does C# compiler resolve 'a<b<c>>' to arithmetic expression or generic type/method?

Surely it doesn't try to have multiple 'goes' over the text of the program until it succeeds, or does it? That would require unbounded look-ahead, and a very complex too.

like image 285
Oleg Mihailik Avatar asked May 16 '12 00:05

Oleg Mihailik


1 Answers

I've been directed to the paragraph 7.6.4.2 in C# language spec:

http://download.microsoft.com/download/0/B/D/0BDA894F-2CCD-4C2C-B5A7-4EB1171962E5/CSharp%20Language%20Specification.htm

The productions for simple-name (§7.6.2) and member-access (§7.6.4) can give rise to ambiguities in the grammar for expressions.

...

If a sequence of tokens can be parsed (in context) as a simple-name (§7.6.2), member-access (§7.6.4), or pointer-member-access (§18.5.2) ending with a type-argument-list (§4.4.1), the token immediately following the closing > token is examined. If it is one of

( ) ] } : ; , . ? == != | ^

then the type-argument-list is retained as part of the simple-name, member-access or pointer-member-access and any other possible parse of the sequence of tokens is discarded. Otherwise, the type-argument-list is not considered to be part of the simple-name, member-access or pointer-member-access, even if there is no other possible parse of the sequence of tokens. Note that these rules are not applied when parsing a type-argument-list in a namespace-or-type-name (§3.8).

So, there may indeed an ambiguity arise when type-argument-list is involved, and they've got a cheap way to resolve it, by looking one token ahead.

It's still an unbound look ahead, because there might be a megabyte worth of comments between '>>' and following token, but at least the rule is more or less clear. And most importantly there is no need for speculative deep parsing.

like image 130
Oleg Mihailik Avatar answered Oct 25 '22 10:10

Oleg Mihailik