How should I go about building a simple LR parser?

Tags:

I am trying to build a simple LR parser for a type of template (configuration) file that will be used to generate some other files. I've read and read about LR parsers, but I just can't seem to understand it! I understand that there is a parse stack, a state stack and a parsing table. Tokens are read onto the parse stack, and when a rule is matched then the tokens are shifted or reduced, depending on the parsing table. This continues recursively until all of the tokens are reduced and the parsing is then complete.

The problem is I don't really know how to generate the parsing table. I've read quite a few descriptions, but the language is technical and I just don't understand it. Can anyone tell me how I would go about this?

Also, how would I store things like the rules of my grammar?

http://codepad.org/oRjnKacH is a sample of the file I'm trying to parse with my attempt at a grammar for its language.

I've never done this before, so I'm just looking for some advice, thanks.

691

asked Feb 23 '10 19:02

Isaac

1 Answers

In your study of parser theory, you seem to have missed a much more practical fact: virtually nobody ever even considers hand writing a table-driven, bottom-up parser like you're discussing. For most practical purposes, hand-written parsers use a top-down (usually recursive descent) structure.

The primary reason for using a table-driven parser is that it lets you write a (fairly) small amount of code that manipulates the table and such, that's almost completely generic (i.e. it works for any parser). Then you encode everything about a specific grammar into a form that's easy for a computer to manipulate (i.e. some tables).

Obviously, it would be entirely possible to do that by hand if you really wanted to, but there's almost never a real point. Generating the tables entirely by hand would be pretty excruciating all by itself.

For example, you normally start by constructing an NFA, which is a large table -- normally, one row for each parser state, and one column for each possible input. At each cell, you encode the next state to enter when you start in that state, and then receive that input. Most of these transitions are basically empty (i.e. they just say that input isn't allowed when you're in that state). Note: since the valid transitions are so sparse, most parser generators support some way of compressing these tables, but that doesn't change the basic idea).

You then step through all of those and follow some fairly simple rules to collect sets of NFA states together to become a state in the DFA. The rules are simple enough that it's pretty easy to program them into a computer, but you have to repeat them for every cell in the NFA table, and do essentially perfect book-keeping to produce a DFA that works correctly.

A computer can and will do that quite nicely -- for it, applying a couple of simple rules to every one of twenty thousand cells in the NFA state table is a piece of cake. It's hard to imagine subjecting a person to doing the same though -- I'm pretty sure under UN guidelines, that would be illegal torture.

122

answered Sep 21 '22 14:09

Jerry Coffin

Related questions
                            
                                template which enforces interface
                            
                                Data structures for real time applications
                            
                                Does code in header file increases binary size?
                            
                                Is '@' used in C++?
                            
                                How to delta encode a C/C++ struct for transmission via sockets
                            
                                How to write a class capable of foreach
                            
                                Is there an OS function to translate a REFIID to a helpful name?
                            
                                CWnd::CreateDlgIndirect leaves m_hWnd==NULL
                            
                                Which one is preferred, return const double& OR return double
                            
                                C++ linker - Lack of duplicate symbols
                            
                                std::ostream not formatting const char* correctly the first time it's used
                            
                                What is the Code Definition Window in Visual C++ 2008 Express?
                            
                                How to print an integral template argument at compile time in C++
                            
                                C++ Creating a Weighted Graph?
                            
                                How to use SDL with OGRE?
                            
                                Help with type traits
                            
                                Extracting text from images
                            
                                Dynamic allocation of memory
                            
                                Drag and Drop like Winspector Spy
                            
                                Passing data around with QMimeData in Qt drag and drop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How should I go about building a simple LR parser?

Tags:

c++

file

parsing

configuration

Isaac

People also ask

1 Answers

Jerry Coffin

Recent Activity

Donate For Us