Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to pass contextual information to parsers?

I am parsing a small declarative language where in a scope you can have variables declared (with a type), and then later on, just like in most other languages, the name (without the type) is used.

The declaration of variable would look like this:

?varname
?varname1 ?varname2 - type1
?varname3 ?varname4 ?varname5 - type2

If the type is omitted, the default type should be object, like in the first case. So for that I have a specific parser which returns a list of my own domain object called LiftedTerm (you can just assume its a tuple with the name of the variable and the type of the variable, in reality there is some more stuff in it but irrelevant for this problem):

def typed_list_variables : Parser[List[LiftedTerm]]= typed_variables.+ ^^ { case list => list.flatten.map(variable =>
        LiftedTerm(variable._1, variable._2 match {
          case "object" => ObjectType
          case _ => TermType(variable._2)
        })) }

def typed_variables = ((variable+) ~ (("-" ~> primitive_type)?)) ^^ {
    case variables ~ primitive_type => 
         for (variable <- variables) yield variable -> primitive_type.getOrElse("object")
}

def variable = """\?[a-zA-Z][a-zA-Z0-9_-]*""".r
def primitive_type = """[a-zA-Z][a-zA-Z0-9_-]*""".r

All this works perfectly fine.

Now further down in the same 'scope' I have to parse the parts where there is a reference to these variables. The variable obviously won't be declared again in full. So, in the above example, places where ?varname1 is used won't include type1. However, when I parse the rest of the input I wish to get the reference of the right LiftedTerm object, rather than just a string.

I have some recursive structures in place, so I don't wish to do this mapping at the top level parser. I don't wish to make a 'global mapping' of these either in my RegexParsers object because most of these are scoped and only relevant for a small piece of the input.

Is there a way of passing contextual information to a parser? Ideally I pass the list of LiftedTerm (or better still a map from the variable names String -> LiftedTerm) into the recursive parser calls.

(Apologies if this is something obvious, I am still new to Scala and even newer to parser combinators).

like image 307
jbx Avatar asked Jan 02 '14 14:01

jbx


People also ask

What is parser interface?

Parser interface is the key concept of Apache Tika. It hides the complexity of different file formats and parsing libraries while providing a simple and powerful mechanism for client applications to extract structured text content and metadata from all sorts of documents.

Are parser combinators slow?

Parser combinators are generally slower than a hand-written or code-generated parser. That's somewhat innate due to the overhead of “threading” (for lack of a better word) your control flow through many function calls.

What is parser method?

Ans: Parsing (also known as syntax analysis) can be defined as a process of analyzing a text which contains a sequence of tokens, to determine its grammatical structure with respect to a given grammar.


1 Answers

AFAIK, scala's combinator parser library is limited to contex-free grammars. Hence, your usecase is not supported.

The proper way to go would be to extend scala.util.parsing.combinator.Parsers and provide a custom Parser class which carries your context around. Than you need to define all the combinators to also deal with the context.

edit: As has been pointed out below, parsers have a method into and flatMap, therefore, when you have a parser that yields your context, you can combine it with another parser that requires a context in monadic style.

like image 54
choeger Avatar answered Sep 18 '22 13:09

choeger