Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitize roslyn memberdeclarationsyntax identifier

Tags:

c#

roslyn

Is there any existing method or mechanisms in roslyn for sanitizing the identifier names of memberdeclarationsynxtax nodes when building synxtaxtrees?

for example names with spaces, dots, dashes or using reserved words like class, void, string

[Edit] To clarify, the code is going to be generated so you do not know or have control over the input in advance, the goal is to sanitize the input. I am referring to the syntaxtree equivilent to Path.GetInvalidFileNameChars() which you might use to sanitize input for creation of directories and file. I am asking if there is any such mechanism with roslyn.

like image 417
Scott Mackay Avatar asked Jan 08 '23 20:01

Scott Mackay


2 Answers

While following up on the accepted answer, I found some additional methods on SyntaxFacts that could be of use here.

GetKeywordKind(string) returns a SyntaxKind representing the keyword, or SyntaxKind.None if the passed string isn't a C# keyword. GetContextualKeywordKind(string)does the same thing for contextual keywords. Which should make it easy to do something like:

string identifier = "double";
bool isAnyKeyword = SyntaxFacts.GetKeywordKind(identifier) != SyntaxKind.None
                 || SyntaxFacts.GetContextualKeywordKind(identifier) != SyntaxKind.None;
like image 68
Salvador Avatar answered Jan 17 '23 18:01

Salvador


I think there are two parts to this answer. First of all, at the syntax level you will never have an IdentifierNameSyntax with the value of a reserved keyword. I know you're talking about method declarations, but the same idea applies to namespaces, (which have shallower syntax trees to look at).

Consider:

namespace class
{
}

The corresponding syntax tree (generated with the Roslyn Syntax Visualizer):

enter image description here

Notice that in the above image IdentifierName has a lightning bolt next to it. This indicates that it's completely missing. The parser does not mistake class as an identifier. It knows that every time it sees the keyword class that it is the beginning of a ClassDeclarationSyntax.

Your syntax tree is absolutely destroyed and even Visual Studio does not realize you were trying to use an identifier with the value of "class". It indicates three errors:

  • Identifier expected
  • { expected
  • } expected

The second part of this answer applies if you're generating identifiers and would like to detect if a string you've generated may be used as a valid identifier. I searched the codebase briefly and didn't notice any methods that would detect both invalid characters and keywords in identifiers. However, we can combine two methods to achieve what you want:

SyntaxFacts.IsValidIdentifier() and IsCSharpKeyword() (which is unfortunately internal and must be copied into your program).

string myIdentifier = "test&";
bool validIdentifier = SyntaxFacts.IsValidIdentifier(myIdentifier); //false
string myOtherIdentifier = "class";
bool isKeyword = myOtherIdentifier.IsCSharpKeyword(); //true

Note that IsCSharpKeyword() does not check for Contextual Keywords and they may appear within identifiers. However, naming your class var will likely introduce semantic errors so you might want to add these contextual keywords as well.

like image 32
JoshVarty Avatar answered Jan 17 '23 17:01

JoshVarty