I'm want to parse a custom string format that is persisting an object graphs state. This is ASP.NET scenario and I wanted something easy to use on the client (JavaScript) and server (C#).
I have a format something like
{Name1|Value1|Value2|...|ValueN}{Name2|Value1|...}{...}{NameN|...}
In this format I have 3 delimiters, {
, }
, and |
. Further, because these characters are conceivable in the name/values, I defined an escape sequence using the very common \
, such that \{
, \}
and \|
are all interpreted as normal versions of themselves and of course \\
is a backslash. All pretty standard.
Originally I tried to use a regex to try to parse out the string representation of an object with something like this (?<!\\)\{(.*?)(?<!\\)\}
. Keep in mind \
, {
, and }
are all reserved in regexes. This of course will be able to parse out something like {category|foo\}|bar\{}
correctly. However I realized it would fail with something like {category|foo|bar\\}
.
It only took me a minute or so to try this (?<!(?<!\\)\\)\{(.*?)(?<!(?<!\\)\\)\}
and realize that this approach was not possible given that you'd need an infinite number of negative lookbehinds to deal with a potential infinite number of escape sequences. Of course it's unlikely that I'd ever have more than one or two levels so I could probably hard code it. However, I feel that this is a common enough problem it should have a well defined solution.
My next approach was to try to write a defined parser where I actually scanned the input buffer and consumed each character in a forward only method. I haven't actually finished this yet but it seems overly complicated and I feel I must be missing something obvious. I mean we've had parsers as long as we've had computer languages.
So my question would be what is the simplest, efficient and elegant way to decode an input buffer like this with possible escape sequences?
(?<!\\)(?:\\\\)*\{(.*?(?<!\\)(?:\\\\)*)\}
(?<!\\)
will prevent any \
before this point.
(?:\\\\)*
will allow any number of escaped \
.
\{
matches an opening brace.
(
begins a capture group.
.*?
matches the content, including any |
.
(?<!\\)
will prevent any \
before this point.
(?:\\\\)*
will allow any number of escaped \
.
)
ends the capture group.
\}
matches an closing brace.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With