Ok, so I would like to make a GLR parser generator. I know there exist such programs better than what I will probably make, but I am doing this for fun/learning so that's not important.
I have been reading about GLR parsing and I think I have a decent high level understanding of it now. But now it's time to get down to business.
The graph-structured stack (GSS) is the key data structure for use in GLR parsers. Conceptually I know how GSS works, but none of the sources I looked at so far explain how to implement GSS. I don't even have an authoritative list of operations to support. Can someone point me to some good sample code/tutorial for GSS? Google didn't help so far. I hope this question is not too vague.
In computer science, a graph-structured stack (GSS) is a directed acyclic graph where each directed path represents a stack. The graph-structured stack is an essential part of Tomita's algorithm, where it replaces the usual stack of a pushdown automaton.
A graph is a common data structure that consists of a finite set of nodes (or vertices) and a set of edges connecting them. A pair (x,y) is referred to as an edge, which communicates that the x vertex connects to the y vertex. In the examples below, circles represent vertices, while lines represent edges.
Graphs in data structures are non-linear data structures made up of a finite number of nodes or vertices and the edges that connect them. Graphs in data structures are used to address real-world problems in which it represents the problem area as a network like telephone networks, circuit networks, and social networks.
Firstly, if you haven't already, you should read McPeak's paper on GLR http://www.cs.berkeley.edu/~smcpeak/papers/elkhound_cc04.ps. It is an academic paper, but it gives good details on GSS, GLR, and the techniques used to implement them. It also explains some of the hairy issues with implementing a GLR parser.
You have three parts to implementing a graph-structured stack.
I. The graph data structure itself
II. The stacks
III. GLR's use of a GSS
You are right, google isn't much help. And unless you like reading algorithms books, they won't be much help either.
I. The graph data structure
Rob's answer about "the direct representation" would be easiest to implement. It's a lot like a linked-list, except each node has a list of next nodes instead of just one.
This data structure is a directed graph, but as the McPeak states, the GSS may have cycles for epsilon-grammars.
II. The stacks
A graph-structured stack is conceptually just a list of regular stacks. For an unambiguous grammar, you only need one stack. You need more stacks when there is a parsing conflict so that you can take both parsing actions at the same time and maintain the different state both actions create. Using a graph allows you to take advantage of the fact that these stacks share elements.
It may help to understand how to implement a single stack with a linked-list first. The head of the linked list is the top of the stack. Pushing an element onto the stack is just creating a new head and pointing it to the old head. Popping an element off the stack is just moving the pointer to head->next.
In a GSS, the principle is the same. Pushing an element is just creating a new head node and pointing it to the old head. If you have two shift operations, you will push two elements onto the old head and then have two head nodes. Conceptually this is just two different stacks that happen share every element except the top ones. Popping an element is just moving the head pointer down the stack by following each of the next nodes.
III. GLR's use of the GSS
This is where McPeak's paper is a useful read.
The GLR algorithm takes advantage of the GSS by merging stack heads that have the same state element. This means that one state element may have more than one child. When reducing, the GLR algorithm will have to explore all possible paths from the stack head.
You can optimize GLR by maintaining the deterministic depth of each node. This is just the distance from a split in the stack. This way you don't always have to search for a stack split.
This is a tough task! So good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With