Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parser Generators and Ragel... Making my own D Parser

I'm new to the world of compilers, and I recently heard about something called a parser generator. From what I (think) I've understood, parser generators take in a syntax file and output a source code file that can parse files with the given syntax.

A few questions:

  1. Did I understand that correctly?

  2. If so, is Ragel such a tool?

  3. If it is, can Ragel output a D parser into D source code?

Thank you!

like image 696
user541686 Avatar asked Jan 18 '11 00:01

user541686


3 Answers

  1. That's basically it. Parser generators transform a grammar into a source file that can be used to recognize strings that are members of the language defined by the grammar. Often, but not always, a parser generator requires a lexical analyzer to break text down into tokens before it does its work. Lex and Yacc are classic examples of a paired lexical analyzer and parser generator.

    Modern parser generators offer additional features. For instance, ANTLR can generate code for lexical analysis, grammatical analysis, and even walk the generated abstract syntax tree. Elkhound generates a parser that uses the GLR parsing algorithm. This allows it to recognize a wider range of languages than non-generalized parsing algorithms. PEG Parsers don't require a separate lexical analyzer.

  2. Ragel actually generates a lexical analyzer in the form of a finite state machine. It can recognize a regular language but not a context-free language. This means it can't recognize most programming languages, including D.

  3. Ragel does generate D code if you need a fast lexical analyzer.

To fully understand what a parser generator does for you, you'll need some formal language and parsing theory. There are worse places to start than the The Dragon Book. See also: Learning to write a compiler.

If you're feeling brave, be sure to check out the lexing and parsing code distributed with the DMD compiler - /dmd2/src/dmd/ - lexer.c and parse.c.

like image 83
Corbin March Avatar answered Nov 08 '22 03:11

Corbin March


While Ragel is based on regular expressions, it's not just a regex FSM generator. It allows recursion using an additional call/return syntax, as well as other features which allow parsing non-regular languages. So while Ragel does generate FSMs, it allows generating multiple different FSMs and provides mechanisms for jumping between them at arbitrary points, or using a special machine transition syntax. It also allows executing arbitrary code at state transitions.

Another thing that makes Ragel unique is that it's online. In other words, it's easy to use to scan data from an asynchronous source, such as a non-blocking socket. It also uses no dynamic resources, except that for call/return you can use either static, automatic, or dynamic memory for the stack; however you want. There's no global state, either.

Ragel is quite unique. Unlike most (all?) traditional generators, it was made for network programming.

like image 34
Anonymous Avatar answered Nov 08 '22 03:11

Anonymous


Could be:

MySourceCode --> (Scanner) --> MyScannerDataFile MyScannerDataFile --> (Parser) --> MyParserDataFile MyParserDataFile --> (CodeGenerator) --> MyExecutableFile

or:

MySourceCode --> (ScannerAndParser) --> MyScannerAndParserDataFile MyScannerAndParserDataFile --> (CodeGenerator) --> MyExecutableFile

like image 1
umlcat Avatar answered Nov 08 '22 02:11

umlcat