Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: C# method declaration parsing

Tags:

c#

regex

Could somebody help me parse following from the C# method declaration: scope, isStatic, name, return type and list of the parameters and their types. So given method declaration like this

public static SomeReturnType GetSomething(string param1, int param2)

etc. I need to be able to parse it and get the info above. So in this case

  • name = "GetSomething"
  • scope = "public"
  • isStatic = true
  • returnType = "SomeReturnType"

and then array of parameter type and name pairs.

Oh almost forgot the most important part. It has to account for all other scopes (protected, private, internal, protected internal), absence of "static", void return type etc.

Please note that REFLECTION is not solution here. I need REGEX.

So far I have these two:

 (?:(?:public)|(?:private)|(?:protected)|(?:internal)|(?:protected internal)\s+)*

(?:(?:static)\s+)*

I guess for rest of the problem I can just get away with string manipulation without regex.

like image 358
epitka Avatar asked Feb 28 '23 17:02

epitka


2 Answers

Some thoughts on your problem:

A set of strings that can all be matched by a particular regular expression is called a regular language. The set of strings which are legal method declarations is not a regular language in any version of C#. If you are attempting to find a regular expression which matches every legal C# method declaration and rejects every illegal C# method declaration then you are out of luck.

More generally, regular expressions are almost always a bad idea for anything but the simplest matching problems. (Sorry Jeff.) A far better approach is to first write a lexer, which breaks up the string into a sequence of tokens. Then analyze the token sequence. (Using regular expressions as part of a lexer is not a terrible idea, though you can get by without them.)

I note also that you are glossing over rather a lot of complications in parsing method declarations. You did not mention:

  • generic/array/pointer/nullable return and formal parameter types
  • generic type parameter declarations
  • generic type parameter constraints
  • unsafe/extern/new/override/virtual/abstract/sealed methods
  • explicit interface implementation methods
  • method/parameter/return attributes
  • partial methods -- slightly tricky to parse, partial is a contextual keyword
  • comments

I also note that you've not said whether you are guaranteed that the method signature is already good, or if you need to identify bad ones and produce diagnostics as to why they're bad. That's a much harder problem.

Why do you want to do this in the first place? Doing this correctly is rather a lot of work. Perhaps there is an easier way to get what you want?

like image 112
Eric Lippert Avatar answered Mar 02 '23 07:03

Eric Lippert


I wouldn't bother with using Regex. When you get to the part of interpreting method parameters, it gets really messy (ref and out keywords for example). I don't know if you need support for attribute notation as well, but that would make it a complete mess.

Maybe a C# parser library can be of help. I've found a few on the internet:

  • http://www.codeplex.com/csparser (C# 1.0)
  • http://www.csharpparser.com/

Alternatively, you could first feed the code to the compiler at runtime, and then use reflection on the newly created assembly. It will be slower, but pretty much guaranteed to be correct. Even though you seem to be opposed to the idea of using reflection, this can be a viable solution.

Something like this:

List<string> referenceAssemblies = new List<string>()
{
    "System.dll"
    // ...
};

string source = "public abstract class TestClass {" + input + ";}";

CSharpCodeProvider codeProvider = new CSharpCodeProvider();

// No assembly name specified
CompilerParameters compilerParameters =
    new CompilerParameters(referenceAssemblies.ToArray());
compilerParameters.GenerateExecutable = false;
compilerParameters.GenerateInMemory = false;

CompilerResults compilerResults = codeProvider.CompileAssemblyFromSource(
    compilerParameters, source);

// Check for successful compilation here

Type testClass = compilerResults.CompiledAssembly.GetTypes().First();

Then use reflection on testClass.

Compiling should be safe without input validation, because you're not executing any of the code. You'd only need very basic checks, such as making sure only 1 method signature is entered.

like image 36
Thorarin Avatar answered Mar 02 '23 06:03

Thorarin