I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth.
"{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}"
[[["Hello big" "Hi" "Hey"]
["world" "earth"]]
[["Goodbye" "farewell"]
["planet" "rock" "globe" ["."
"!"]]]]
I've tried playing with some regular expressions for this (such as #"{([^{}]*)}" ), but everything I've tried seems to "flatten" the tree into a big list of lists. I could be approaching this from the wrong angle, or maybe a regex just isn't the right tool for the job.
Thanks for your help!
Don't use regular expressions for this task. An easier method would be to describe your string with a grammar (BNF or EBNF) and then write a parser to parse the string according to the grammar. You can generate a parse-tree from your EBNF and BNF and so you naturally end up with a tree structure.
You can start with something like this:
element ::= element-type, { ["|"], element-type }
element-type ::= primitive | "{", element, "}"
primitive ::= symbol | word
symbol ::= "." | "!"
word ::= character { character }
character ::= "a" | "b" | ... | "z"
Note: I wrote this up quickly, and so it may not be completely correct. But it should give you an idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With