Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scannerless Parser Generators

Tags:

Prologue: Although the set of languages recognized by parsers (context-free grammars) is strictly larger than the one of scanners (regular grammars), most parser generators need a scanner.

(Please don't try to explain the reasons behind it, I know them quite well).

I've seen parsers, that do not require a scanner like

  • Elkhound (optional, Tomita/GLR)
  • DParser (Tomita/GLR)
  • SDLR (Tomita/GLR)

There are certain advantages of using no scanner:

  • Just one concept (context-free grammars) instead of two
  • Parse multiple languages in one file (like HTML/Javascript)
  • Parse languages without reserved keywords (like PL/1)

Often, "workarounds" are used like switching the scanner on the parser's request.

Question: Do you know any other scannerless parser generators (any language)? Are these practical in use (or purely academic)? Are there any other approaches than Tomita/GLR?

Answers:

  • boost::spirit::qi by AraK
  • Parsing Expression Grammars (LPEG for Lua) by Norman Ramsey
  • Yakker by Norman Ramsey
  • MBase by SK-logic
  • Waxeye by Trevor Robinson
like image 835
Meinersbur Avatar asked Feb 08 '10 19:02

Meinersbur


People also ask

Which is a parser generator?

A parser generator is an application which generates a parser. Sometimes also called a 'compiler compiler'. The usual input is a formal specification of the grammar the parser has to recognize, plus code implementing the actions the parser has to take when recognizing the various parts of its input.

Why do Scannerless parsers work differently?

Scannerless parsers operate differently because they process directly the original text, instead of processing a list of tokens produced by a lexer. That is to say, a scannerless parser works as a lexer and a parser combined.

What type of parser does Python use?

Making experiments. As the generated C parser is the one used by Python, this means that if something goes wrong when adding some new rules to the grammar you cannot correctly compile and execute Python anymore.


1 Answers

Two others:

  • Bryan Ford's Parsing Expression Grammars (PEG) require no scanner. Efficient, lazy "packrat parser" is optional. I've had nothing but good experience with the Lua LPEG version, which compiles to an efficient bytecode machine. Quite practical.

  • YAKKER looks very intriguing although it is still clearly in a pre-release state. They are using what they claim to be an efficient variation on Earley's parsing algorithm.

I'm actually a huge fan of scannerless parsers; they simplify configuration enormously. And typical scanner generators, to put it mildly, are not much fun to use. From the man page for Lex:

The asteroid to kill this dinosaur is still in orbit.

Finally, I have no personal experience with Elkhound, but the secondhand reports I hear are impressive. I would say there's no question but that some scannerless parser generators are very practical.

like image 128
Norman Ramsey Avatar answered Oct 31 '22 01:10

Norman Ramsey