I'm trying to parse an array of JSON objects into an array of strings in C#. I can extract the array from the JSON object, but I can't split the array string into an array of individual objects.
What I have is this test string:
string json = "{items:[{id:0,name:\"Lorem Ipsum\"},{id:1,name"
+ ":\"Lorem Ipsum\"},{id:2,name:\"Lorem Ipsum\"}]}";
Right now I'm using the following regular expressions right now to split the items into individual objects. For now they're 2 separate regular expressions until I fix the problem with the second one:
Regex arrayFinder = new Regex(@"\{items:\[(?<items>[^\]]*)\]\}"
, RegexOptions.ExplicitCapture);
Regex arrayParser = new Regex(@"((?<items>\{[^\}]\}),?)+"
, RegexOptions.ExplicitCapture);
The arrayFinder
regex works the way I'd expect it but, for reasons I don't understand, the arrayParser
regex doesn't work at all. All I want it to do is split the individual items into their own strings so I get a list like this:
{id:0,name:"Lorem Ipsum"}
{id:1,name:"Lorem Ipsum"}
{id:2,name:"Lorem Ipsum"}
Whether this list is a string[]
array or a Group
or Match
collection doesn't matter, but I'm stumped as to how to get the objects split. Using the arrayParser
and the json
string declared above, I've tried this code which I assumed would work with no luck:
string json = "{items:[{id:0,name:\"Lorem Ipsum\"},{id:1,name"
+ ":\"Lorem Ipsum\"},{id:2,name:\"Lorem Ipsum\"}]}";
Regex arrayFinder = new Regex(@"\{items:\[(?<items>[^\]]*)\]\}"
, RegexOptions.ExplicitCapture);
Regex arrayParser = new Regex(@"((?<items>\{[^\}]\}),?)+"
, RegexOptions.ExplicitCapture);
string array = arrayFinder.Match(json).Groups["items"].Value;
// At this point the 'array' variable contains:
// {id:0,name:"Lorem Ipsum"},{id:1,name:"Lorem Ipsum"},{id:2,name:"Lorem Ipsum"}
// I would have expected one of these 2 lines to return
// the array of matches I'm looking for
CaptureCollection c = arrayParser.Match(array).Captures;
GroupCollection g = arrayParser.Match(array).Groups;
Can anybody see what it is I'm doing wrong? I'm totally stuck on this.
Balanced parentheses are literally a textbook example of a language that cannot be processed with regular expressions. JSON is essentially balanced parentheses plus a bunch of other stuff, with the braces replaced by parens. In the hierarchy of formal languages, JSON is a context-free language. Regular expressions can't parse context-free languages.
Some systems offer extensions to regular expressions that kinda-sorta handle balanced expressions. However they're all ugly hacks, they're all unportable, and they're all ultimately the wrong tool for the job.
In professional work, you would almost always use an existing JSON parser. If you want to roll your own for educational purposes then I'd suggest starting with a simple arithmetic grammar that supports + - * / ( ). (JSON has some escaping rules which, while not complex, will make your first attempt harder than it needs to be.) Basically, you'll need to:
This is a typical third-year CS assignment at just about any university.
The next step is to find out how complex a JSON string you need to trigger a stack overflow in your recursive parser. Then look at the other types of parsers that can be written, and you'll understand why anyone who has to parse a context-free language in the real world uses a tool like yacc or antlr instead of writing a parser by hand.
If that's more learning than you were looking for then you should feel free to go use an off-the-shelf JSON parser, satisified that you learned something important and useful: the limits of regular expressions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With