Is it stupid to build a regex based parser?
Matching nested parens is exceedingly simple using modern patterns. Not counting whitespace, this sort of thing:
\( (?: [^()] *+ | (?0) )* \)
works for mainstream languages like Perl and PHP, plus anything that uses PCRE.
However, you really need grammatical regexes for a full parse, or you’ll go nuts. Don’t use a language whose regexes don’t support breaking regexes down into smaller units, or which don’t support proper debugging of their compilation and execution. Life’s too short for low-level hackery. Might as well go back to assembly language if you’re going to do that.
I’ve written about recursive patterns, grammatical patterns, and parsing quite a bit: for example, see here for parsing approaches and here for lexer approaches; also, the final solution here.
Also, Perl’s Regexp::Grammars
module is especially useful in turning grammatical regexes into parsing structures.
So by all means, go for it. You’ll learn a lot that way.
For work? Yes. For learning? No.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With