Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the runtime difference between different parsing algorithms?

There are lots of different parsing algorithms out there (recursive descent, LL(k), LR(k), LALR, ...). I find a lot of information about the different grammars different types of parser can accept. But how do they differ in runtime behavior? Which algorithm is faster, uses less memory or stack space?

Or to put this differently - which algorithm performs best, assuming the grammar can be formulated to work with any algorithm?

like image 461
Sven Avatar asked Oct 13 '10 10:10

Sven


People also ask

What is the difference between LL 1 and LR k parsing?

At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol. An LL parse is a left-to-right, leftmost derivation.

Which parser is fastest?

We released simdjson 0.3: the fastest JSON parser in the world is even better! Last year (2019), we released the simjson library. It is a C++ library available under a liberal license (Apache) that can parse JSON documents very fast.

Which of the parsing algorithm is most efficient?

The LR paring algorithm is one of the most efficient parsing algorithms. It is totally deterministic and no backtracking or search is involved.


1 Answers

LR parsers IMHO can be the fastest. Basically they use a token as an index into a lookahead set or a transition table to decide what to do next (push a state index, pop a state indexes/call a reduction routine). Converted to machine code this can be just a few machine instructions. Pennello discusses this in detail in his paper:

Thomas J. Pennello: Very fast LR parsing. SIGPLAN Symposium on Compiler Construction 1986: 145-151

LL parsers involve recursive calls, which are a bit slower than just plain table lookups, but they can be pretty fast.

GLR parsers are generalizations of LR parsers, and thus have to be slower than LR parsers. A key observation is that most of the time a GLR parser is acting exactly as an LR parser would, and one can make that part run essentially as the same speed as an LR parser, so they can be fairly fast.

Your parser is likely to spend more time breaking the input stream into tokens, than executing the parsing algorithm, so these differences may not matter a lot.

In terms of getting your grammar into a usable form, the following is the order in which the parsing technologies "make it easy":

  • GLR (really easy: if you can write grammmar rules, you can parse)
  • LR(k) (many grammars fit, extremely few parser generators)
  • LR(1) (most commonly available [YACC, Bison, Gold, ...]
  • LL (usually requires significant reengineering of grammar to remove left recursions)
  • Hand-coded recursive descent (easy to code for simple grammars; difficult to handle complex grammars and difficult to maintain if the grammar changes a lot)
like image 178
Ira Baxter Avatar answered Oct 28 '22 22:10

Ira Baxter