Nested generic syntax ambiguity

Question

Apparently, C# is as susceptible to '>>' lexer dilemma as is C++.

This C# code is pretty valid, it compiles and runs just fine:

var List = new Dummy("List");
var Nullable = new Dummy("Nullable");
var Guid = new Dummy("Guid");

var x = List<Nullable<Guid>> 10;
var y =  List<Nullable<Guid>> .Equals(10,20);

You'd have to overload '<' and '>>' operators for the Dummy class above.

But the compiler manages to guess that in 'x' case the meaning is to use List, Nullable and Guid local variables. And in 'y' case it suddenly decides to treat them as names of well-known types.

Here's a bit more detailed description with another example: http://mihailik.blogspot.co.uk/2012/05/nested-generics-c-can-be-stinky.html

The question is: how does C# compiler resolve 'a<b<c>>' to arithmetic expression or generic type/method?

Surely it doesn't try to have multiple 'goes' over the text of the program until it succeeds, or does it? That would require unbounded look-ahead, and a very complex too.

Oleg Mihailik · Accepted Answer

I've been directed to the paragraph 7.6.4.2 in C# language spec:

http://download.microsoft.com/download/0/B/D/0BDA894F-2CCD-4C2C-B5A7-4EB1171962E5/CSharp%20Language%20Specification.htm

The productions for simple-name (§7.6.2) and member-access (§7.6.4) can give rise to ambiguities in the grammar for expressions.

...

If a sequence of tokens can be parsed (in context) as a simple-name (§7.6.2), member-access (§7.6.4), or pointer-member-access (§18.5.2) ending with a type-argument-list (§4.4.1), the token immediately following the closing > token is examined. If it is one of

( ) ] } : ; , . ? == != | ^

then the type-argument-list is retained as part of the simple-name, member-access or pointer-member-access and any other possible parse of the sequence of tokens is discarded. Otherwise, the type-argument-list is not considered to be part of the simple-name, member-access or pointer-member-access, even if there is no other possible parse of the sequence of tokens. Note that these rules are not applied when parsing a type-argument-list in a namespace-or-type-name (§3.8).

So, there may indeed an ambiguity arise when type-argument-list is involved, and they've got a cheap way to resolve it, by looking one token ahead.

It's still an unbound look ahead, because there might be a megabyte worth of comments between '>>' and following token, but at least the rule is more or less clear. And most importantly there is no need for speculative deep parsing.

Nested generic syntax ambiguity >>

Tags:

c#

lexer

nested-generics

Oleg Mihailik

1 Answers

Oleg Mihailik

Recent Activity

Donate For Us

Nested generic syntax ambiguity >>

Tags:

c#

lexer

nested-generics

Oleg Mihailik

1 Answers

Oleg Mihailik

Related questions

Recent Activity

Donate For Us