Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Pyparsing Support Context-Sensitive Grammars?

Forgive me if I have the incorrect terminology; perhaps just getting the "right" words to describe what I want is enough for me to find the answer on my own.

I am working on a parser for ODL (Object Description Language), an arcane language that as far as I can tell is now used only by NASA PDS (Planetary Data Systems; it's how NASA makes its data available to the public). Fortunately, PDS is finally moving to XML, but I still have to write software for a mission that fell just before the cutoff.

ODL defines objects in something like the following manner:

OBJECT              = TABLE
  ROWS              = 128
  ROW_BYTES         = 512 
END_OBJECT          = TABLE

I am attempting to write a parser with pyparsing, and I was doing fine right up until I came to the above construction.

I have to create some rule that is able to ensure that the right-hand-value of the OBJECT line is identical to the RHV of END_OBJECT. But I can't seem to put that into a pyparsing rule. I can ensure that both are syntactically valid values, but I can't go the extra step and ensure that the values are identical.

  1. Am I correct in my intuition that this is a context-sensitive grammar? Is that the phrase I should be using to describe this problem?
  2. Whatever kind of grammar this is in the theoretical sense, is pyparsing able to handle this kind of construction?
  3. If pyparsing is not able to handle it, is there another Python tool capable of doing so? How about ply (the Python implementation of lex/yacc)?
like image 405
HardlyKnowEm Avatar asked Feb 27 '13 02:02

HardlyKnowEm


People also ask

Where is context sensitive grammar used?

Context-sensitive grammar is an important grammar in formal language theory, in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols.

What is context sensitive grammar and how it is different from CFG?

In context sensitive grammar, there is either left context or right context (αAβ i.e. α is left context and β is right) with variables. But in context free grammar (CFG) there will be no context. We cannot replace B until we get B0. Therefore, CSG is harder to understand than the CFG.

Is context free subset of context-sensitive?

Context-free grammars are a subset of context-sensitive grammars. A grammar is a concise way to specify a language.

What is context sensitive grammar in compiler design?

A formal grammar known as a CSG allows any production rules' left and right sides to be surrounded by a context of the terminal as well as nonterminal symbols.


2 Answers

It is in fact a grammar for a context-sensitive language, classically abstracted as wcw where w is in (a|b)* (note that wcw' , where ' indicates reversal, is context-free).

Parsing Expression Grammars are capable of parsing wcw-type languages by using semantic predicates. PyParsing provides the matchPreviousExpr() and matchPreviousLiteral() helper methods for this very purpose, e.g.

w = Word("ab")
s = w + "c" + matchPreviousExpr(w)

So in your case you'd probably do something like

table_name = Word(alphas, alphanums)
object = Literal("OBJECT") + "=" + table_name + ... +
  Literal("END_OBJECT") + "=" +matchPreviousExpr(table_name)
like image 169
ebohlman Avatar answered Oct 01 '22 12:10

ebohlman


As a general rule, parsers are built as context-free parsing engines. If there is context sensitivity, it is grafted on after parsing (or at least after the relevant parsing steps are completed).

In your case, you want to write context-free grammar rules:

  head = 'OBJECT' '=' IDENTIFIER ;
  tail = 'END_OBJECT'  '=' IDENTIFIER ;
  element = IDENTIFIER '=' value ;
  element_list = element ;
  element_list = element_list element ;
  block = head element_list tail ;

The checks that the head and tail constructs have matching identifiers isn't technically done by the parser.

Many parsers, however, allow a semantic action to occur when a syntactic element is recognized, often for the purpose of building tree nodes. In your case, you want to use this to enable additional checking. For element, you want to make sure the IDENTIFIER isn't a duplicate of something already in the block; this means for each element encountered, you'll want to capture the corresponding IDENTIFIER and make a block-specific list to enable duplicate checking. For block, you want to capture the head *IDENTIFIER*, and check that it matches the tail *IDENTIFIER*.

This is easiest if you build a tree representing the parse as you go along, and hang the various context-sensitive values on the tree in various places (e.g., attach the actual IDENTIFIER value to the tree node for the head clause). At the point where you are building the tree node for the tail construct, it should be straightforward to walk up the tree, find the head tree, and then compare the identifiers.

This is easier to think about if you imagine the entire tree being built first, and then a post-processing pass over the tree is used to this checking. Lazy people in fact do it this way :-} All we are doing is pushing work that could be done in the post processing step, into the tree-building steps attached to the semantic actions.

None of these concepts is python specific, and the details for PyParsing will vary somewhat.

like image 21
Ira Baxter Avatar answered Oct 01 '22 13:10

Ira Baxter