Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to convert PHP Regex (using subroutine calls) to C# regex?

Tags:

c#

regex

php

The example php regex (below) uses subroutine calls to work.

If I try use it with the C# Regex class I get an error: Unrecognized grouping construct

Is it possible to rewrite this in to C# regex syntax?

Would it be a simple translation, or does another (regex) approach need to be used?

If it is not possible what is the name of the thing it is using, so I can add it to this question to make it more useful to others with the same problem?

PHP which works with all json RFC test data

$pcre_regex = '
  /
  (?(DEFINE)
     (?<number>   -? (?: [1-9]\d*| 0 ) (\.\d+)? (e [+-]? \d+)? )    
     (?<boolean>   true | false | null )
     (?<string>    " (?>[^"\\\\]+ | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
     (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
     (?<pair>      \s* (?&string) \s* : (?&json)  )
     (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
     (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
  )
  \A (?&json) \z
  /six   
';

And not working in C#

string pattern = @"(?(DEFINE)
 (?<number>   -? (?: [1-9]\d* | 0 ) (\.\d+)? (e [+-]? \d+)? )    
 (?<boolean>   true | false | null )
 (?<string>    "" (?>[^""\\\\]+ | \\\\ [""\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* "" )
 (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
 (?<pair>      \s* (?&string) \s* : (?&json)  )
 (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
 (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* ))
\A (?&json) \z
";
    string input = @"[{\"Example\": \"data\"}]";
    RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline;

    bool isValid = Regex.IsMatch(input, pattern, options);

Edit: This question is NOT about using regex with json, it is about how to do something (subroutine calls) in C#, which CAN be done in PHP regex

Just because there is a way of parsing json in C# DOES NOT answer the question. Please keep your answers and comments on topic.

like image 797
DarcyThomas Avatar asked Nov 12 '17 08:11

DarcyThomas


2 Answers

This does not directly answer the question but is a work around.

Rather than using the BCL Regex class, there is a project called PCRE.NET, which wraps the PCRE regex engine (the same engine which is used in the PHP example) with C# function calls.

This would allow the use of regex with subroutine calls in C# land.

like image 105
DarcyThomas Avatar answered Oct 31 '22 15:10

DarcyThomas


The short answer is kinda, but not really.

.Net regex has a concept called balancing groups.

This is really good for checking if all of your opening braces have matching (i.e., nested is Ok, but overlapping is not)

For example this regex will ensure that all of the curly braces match:

{(?:[^{}]|(?<Open>{)|(?<Content-Open>}))+(?(Open)(?!))}

Which matches this string:

{1 2 {3} {4 5 {6}} 7}

However it is beyond me to craft a regex which includes several nested groupings; like in the example.

Further more it looks like you would need to make a nested regex pattern with as many nestlings you would expect in your source data.

What you could try is combining balanced groups with some recursive C# to par down each grouping. There is something similar in this answer (But I would not recommend it in this case)

Alternatively you could add this nuget package. Which is a wrapper around the PCRE regex engine, which supports recursive subroutines. Details here.

like image 25
DeltaTango Avatar answered Oct 31 '22 14:10

DeltaTango