I need to split a string from another system, which represents a serialized object. the object itself could have another object of the same type nested as a property. I need a way to essentially serialize the string into a string array. for example.
"{1,Dave,2}"
should create a string array with 3 elements "1", "Dave", "2"
.
"{1,{Cat,Yellow},2}"
should become an array with 3 elements "1", "{Cat,Yellow}", "2"
.
"{1,{Cat,{Blue,1}},2}"
should become an array with 3 elements "1", "{Cat,{Blue,1}}", "2"
.
Basically the nesting could be N level deep, so potentially, I could have something like "{{Cat,{Blue,1}},{Dog,White}}" and my resulting array should have 2 elements: "{Cat,{Blue,1}}" and "{Dog,White}"
I thought of writing a custom parser to parse the string manually. But this seems like the kind of problems RegEx was designed to solve, however, I'm not very good with regex, hence would appreciate some pointers from the RegEx pros out there.
Thanks
Well, you can use this split which makes use of balancing groups:
,(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)
It will match a comma that has no {}
ahead, or groups within {}
.
In code:
string msg= "{1,{Cat,{Blue,1}},2}";
msg = msg.Substring(1, msg.Length - 2);
string[] charSetOccurences = Regex.Split(msg, @",(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)");
foreach (string s in charSetOccurences)
{
Console.WriteLine(s);
}
Output:
1
{Cat,{Blue,1}}
2
ideone demo
(?=[^{}]*(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))$)
Is a huge lookahead...
[^{}]*
will match any characters except {}
any number of times.
(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
will match {}
groups with any level of nesting.
It will first catch an opening {
and name it O
(I chose it to mean 'opening') here:
(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
^
Then any characters except braces:
(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
^^^^^^
And repeat that group to accommodate nesting:
(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
^
This part balances the opening brace:
(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
^^^^^^^^
With other non {}
and repeat to cater for the nestings:
(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
^^^^^^^ ^
All this, at least 0 times:
(?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)*(?(O)(?!))
^
The last conditional negative lookahead is just a closure and ensure there's no unbalanced braces.
It's not a Split
, but the if you use the following expression with Match
you'll either get a failed match or one with your individual values in m.Groups[1].Captures
:
^\{(?:((?:[^{}]|\{(?<Depth>)|\}(?<-Depth>))*?)(?:,(?(Depth)(?!))|\}$))*$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With