LR(1) parser state size still an issue?

Question

Historically, LALR(1) parsers were preferred over LR(1) parsers because of resource requirements required by the large number of states generated by LR(1) parsers. It's hard to believe that this continues to be an issue in today's computing environment. Is this still the case or are modern compilers now built with canonical LR parsers, since LALR grammars are a proper subset of LR grammars?

templatetypedef · Accepted Answer

The main concern with LR(1) parsers is the table size, and that table size is going to hurt in one way or another.

If you have an LR(1) parser with 10,000,000 states (not all that uncommon) where there are, say, 50 nonterminals and 50 terminals (not all that unreasonable), you will have a table with one billion entries in it. If you use even one byte per entry, you now need 1GB of space just to hold the table. That space either is in the application binary, in which case you now have a 1GB executable, or it's generated dynamically, in which case you now need 1GB of RAM plus the time to populate it. Neither of these are very attractive.

You absolutely could use an LR(1) parser if you have that kind of memory, but it wouldn't be a good idea. First, the size of the application binary would be enormous. This would make it difficult to distribute the application. Second, the act of loading the table into memory would require a transfer of about 1GB of data from disk into RAM, which would be extraordinarily slow. There's also the issue of paging in and out the parsing tables. If the OS doesn't do a good job evicting pages, you could end up thrashing, degrading performance unacceptably.

While you could put the parser on a server, this typically isn't done right now and would require that all compilation be done over a network.

There's also the question of whether it's worth it. The huge spike in resource costs from the parser would need to be justified by some proportional benefit in parsing quality. In practice, LALR parsers would work for many grammars. For those that it doesn't work for, newer parsing algorithms like IELR or GLR would be a superior choice to LR(1) because they offer the same parsing power (or more in the case of GLR) with significant space reductions. Consequently, you'd be better off using those algorithms.

In summary, yes, you could use LR(1) today, but it would be so resource inefficient that you'd be better off with another parsing algorithm.

Hope this helps!

LR(1) parser state size still an issue?

Tags:

parsing

compiler-construction

grammar

lr

lalr

tgoneil

1 Answers

templatetypedef

Recent Activity

Donate For Us

LR(1) parser state size still an issue?

Tags:

parsing

compiler-construction

grammar

lr

lalr

tgoneil

1 Answers

templatetypedef

Related questions

Recent Activity

Donate For Us