Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive Descent Parser for something simple?

I'm writing a parser for a templating language which compiles into JS (if that's relevant). I started out with a few simple regexes, which seemed to work, but regexes are very fragile, so I decided to write a parser instead. I started by writing a simple parser that remembered state by pushing/popping off of a stack, but things kept escalating until I had a recursive descent parser on my hands.

Soon after, I compared the performance of all my previous parsing methods. The recursive descent parser was by far the slowest. I'm stuck: Is it worth using a recursive descent parser for something simple, or am I justified in taking shortcuts? I would love to go the pure regex route, which is insanely fast (almost 3 times faster than the RD parser), but is very hacky and unmaintainable to a degree. I suppose performance isn't terribly important because compiled templates are cached, but is a recursive descent parser the right tool for every task? I guess my question could be viewed as more of a philosophical one: to what degree is it worth sacrificing maintainability/flexibility for performance?

like image 432
ltimer Avatar asked Apr 03 '11 19:04

ltimer


People also ask

What are recursive-descent parsers explain with example?

It is a kind of Top-Down Parser. A top-down parser builds the parse tree from the top to down, starting with the start non-terminal. A Predictive Parser is a special case of Recursive Descent Parser, where no Back Tracking is required.

What type of grammar can be parsed by a recursive descent parser?

According to "Recursive descent parser" on Wikipedia, recursive descent without backtracking (a.k.a. predictive parsing) is only possible for LL(k) grammars.


1 Answers

Recursive descent parsers can be extremely fast.

These are usually organized with a lexer, that uses regular expressions to recognize langauge tokens that are fed to the parser. Most of the work in processing the source text is done character-by-character by the lexer using the insanely fast FSAs that the REs are often compiled into.

The parser only sees tokens occasionally compared to the rate at which the lexer sees characters, and so its speed often doesn't matter. However, when comparing parser-to-parser speeds, ignoring time required to lex the tokens, recursive descent parsers can be very fast because they implement the parser stack using function calls which are already very efficient compared to general parser push-current-state-on-simulated-stack.

So, you can have you cake and eat it, too. Use regexps for the lexemes. Use the parser (any kind, recursive descent are just fine) to process lexemes. You should be pleased with the performance.

This approach also satisifies the observation made by other answers: write it in a way to make it maintainable. Lexer/Parser separation does this very nicely, I assure you.

like image 103
Ira Baxter Avatar answered Oct 07 '22 22:10

Ira Baxter