Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing parser for escape sequences

I'm want to parse a custom string format that is persisting an object graphs state. This is ASP.NET scenario and I wanted something easy to use on the client (JavaScript) and server (C#).

I have a format something like

{Name1|Value1|Value2|...|ValueN}{Name2|Value1|...}{...}{NameN|...}

In this format I have 3 delimiters, {, }, and |. Further, because these characters are conceivable in the name/values, I defined an escape sequence using the very common \, such that \{, \} and \| are all interpreted as normal versions of themselves and of course \\ is a backslash. All pretty standard.

Originally I tried to use a regex to try to parse out the string representation of an object with something like this (?<!\\)\{(.*?)(?<!\\)\}. Keep in mind \, {, and } are all reserved in regexes. This of course will be able to parse out something like {category|foo\}|bar\{} correctly. However I realized it would fail with something like {category|foo|bar\\}.

It only took me a minute or so to try this (?<!(?<!\\)\\)\{(.*?)(?<!(?<!\\)\\)\} and realize that this approach was not possible given that you'd need an infinite number of negative lookbehinds to deal with a potential infinite number of escape sequences. Of course it's unlikely that I'd ever have more than one or two levels so I could probably hard code it. However, I feel that this is a common enough problem it should have a well defined solution.

My next approach was to try to write a defined parser where I actually scanned the input buffer and consumed each character in a forward only method. I haven't actually finished this yet but it seems overly complicated and I feel I must be missing something obvious. I mean we've had parsers as long as we've had computer languages.

So my question would be what is the simplest, efficient and elegant way to decode an input buffer like this with possible escape sequences?

like image 352
Peter Oehlert Avatar asked Mar 01 '23 03:03

Peter Oehlert


1 Answers

(?<!\\)(?:\\\\)*\{(.*?(?<!\\)(?:\\\\)*)\}

(?<!\\) will prevent any \ before this point.

(?:\\\\)* will allow any number of escaped \.

\{ matches an opening brace.

( begins a capture group.

.*? matches the content, including any |.

(?<!\\) will prevent any \ before this point.

(?:\\\\)* will allow any number of escaped \.

) ends the capture group.

\} matches an closing brace.

like image 80
Markus Jarderot Avatar answered Mar 05 '23 15:03

Markus Jarderot