Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to parse Context-sensitive grammar?

CSG is similar to CFG but the reduce symbol is multiple.

So, can I just use CFG parser to parse CSG with reducing production to multiple terminals or non-terminals?

Like

1. S → a bc
2. S → a S B c
3. c B → W B
4. W B → W X
5. W X → B X
6. B X → B c
7. b B → b b

When we meet W X, can we just reduce W X to W B?

When we meet W B, can we just reduce W B to c B?

So if CSG parser is based on CFG parser, it's not hard to write, is it true?

But when I checked wiki, it said to parse CSG, we should use linear bounded automaton.

What is linear bounded automaton?

like image 998
qdwang Avatar asked Apr 07 '15 02:04

qdwang


2 Answers

Context sensitive grammars are non-deterministic. So you can not assume that a reduction will take place, just because the RHS happens to be visible at some point in a derivation.

LBAs (linear-bounded automata) are also non-deterministic, so they are not really a practical algorithm. (You can simulate one with backtracking, but there is no convenient bound on the amount of time it might take to perform a parse.) The fact that they are acceptors for CSGs is interesting for parsing theory but not really for parsing practice.

Just as with CFGs, there are different classes of CSGs. Some restricted subclasses of CSGs are easier to parse (CFGs are one subclass, for example), but I don't believe there has been much investigation into practical uses; in practice, CSGs are hard to write, and there is no obvious analog of a parse tree which can be constructed from a derivation.

For more reading, you could start with the wikipedia entry on LBAs and continue by following its references. Good luck.

like image 50
rici Avatar answered Oct 07 '22 00:10

rici


The Nearley parser generator is able to parse some simple context-sensitive patterns using postprocessors.

It's possible to write a context-sensitive rule that matches to be or not to be or to do or not to do, but not to be or not to do:

# "_" is whitespace, "word" is a single word
example_rule -> "to" _ word _ "or" _ "not" _ "to" _ word {%
    function(d,l, reject) {
        if (d[2] !== d[10]) {
            return reject;
        } else {
            return {d.join(" ")};
        }
    }
%}
like image 38
Anderson Green Avatar answered Oct 06 '22 23:10

Anderson Green