Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Regex to split a string in C#

I need to split a string from another system, which represents a serialized object. the object itself could have another object of the same type nested as a property. I need a way to essentially serialize the string into a string array. for example.

"{1,Dave,2}" should create a string array with 3 elements "1", "Dave", "2".

"{1,{Cat,Yellow},2}" should become an array with 3 elements "1", "{Cat,Yellow}", "2".

"{1,{Cat,{Blue,1}},2}" should become an array with 3 elements "1", "{Cat,{Blue,1}}", "2".

Basically the nesting could be N level deep, so potentially, I could have something like "{{Cat,{Blue,1}},{Dog,White}}" and my resulting array should have 2 elements: "{Cat,{Blue,1}}" and "{Dog,White}"

I thought of writing a custom parser to parse the string manually. But this seems like the kind of problems RegEx was designed to solve, however, I'm not very good with regex, hence would appreciate some pointers from the RegEx pros out there.

Thanks

like image 886
Kiwik Avatar asked Mar 20 '23 22:03

Kiwik


2 Answers

Well, you can use this split which makes use of balancing groups:

,(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)

It will match a comma that has no {} ahead, or groups within {}.

In code:

string msg= "{1,{Cat,{Blue,1}},2}";
msg = msg.Substring(1, msg.Length - 2);
string[] charSetOccurences = Regex.Split(msg, @",(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)");
foreach (string s in charSetOccurences)
{
    Console.WriteLine(s);
}

Output:

1
{Cat,{Blue,1}}
2

ideone demo


Brief explanation:

(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)

Is a huge lookahead...

[^{}]* will match any characters except {} any number of times.

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!)) will match {} groups with any level of nesting.

It will first catch an opening { and name it O (I chose it to mean 'opening') here:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
           ^

Then any characters except braces:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
             ^^^^^^

And repeat that group to accommodate nesting:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                    ^

This part balances the opening brace:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                        ^^^^^^^^

With other non {} and repeat to cater for the nestings:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                                ^^^^^^^ ^

All this, at least 0 times:

(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
                                          ^

The last conditional negative lookahead is just a closure and ensure there's no unbalanced braces.

like image 141
Jerry Avatar answered Apr 01 '23 01:04

Jerry


It's not a Split, but the if you use the following expression with Match you'll either get a failed match or one with your individual values in m.Groups[1].Captures:

^\{(?:((?:[^{}]|\{(?<Depth>)|\}(?<-Depth>))*?)(?:,(?(Depth)(?!))|\}$))*$
like image 34
Rawling Avatar answered Mar 31 '23 23:03

Rawling