I've been given a task where I have to create a parser for a simple C-like language. I can use any programming language and tools I wish to create the parser, but I'm learning Python at the same time so it would be my preferred choice.
There are a few restrictions my Parser has to follow. Firstly, it must be able to read in a text file that contains the following information:
kind1 : spelling1
kind2 : spelling2
kind3 : spelling3
.
.
.
kindn : spellingn
Where each kind and spelling refer to the token type and value of the language. This file is the result of putting a sample of code through the language's lexical analyser.
Secondly, I must be able to customise the output of the parser. Ideally I would like to output a file that has converted the kind:spelling list into another sequence of tokens that would be passed to the language's compiler to be converted into MIPS Assembly code. Here's a little example of the kind of thing I would like the parser to be able to produce:
%function int test
%variable int x
%variable int y
%begin
%if %id y , %id x > %do
%begin
%return %num 0
%end
%return %num 1
%end
It would be a great help if someone could advise me on existing Python Parser Generators and if I'd be able to achieve the sort of thing I'm looking for in the above examples.
Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
Pegen. Pegen is the parser generator used in CPython to produce the final PEG parser used by the interpreter. It is the program that can be used to read the python grammar located in Grammar/Python.
PyParsing is a python tool to generate parsers. There are a lot of interesting examples.
Easy to get started:
from pyparsing import Word, alphas
# define grammar
greet = Word( alphas ) + "," + Word( alphas ) + "!"
# input string
hello = "Hello, World!"
# parse input string
print hello, "->", greet.parseString( hello )
I recommend that you check out Lark: https://github.com/erezsh/lark
It can parse ALL context-free grammars, it automatically builds an AST (with line & column numbers), and it accepts the grammar in EBNF format, which is simple to write and it's considered the standard.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With