Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which tool to use to parse programming languages in Python?

Tags:

python

parsing

Which Python tool can you recommend to parse programming languages? It should allow for a readable representation of the language grammar inside the source, and it should be able to scale to complicated languages (something with a grammar as complex as e.g. Python itself).

When I search, I mostly find pyparsing, which I will be evaluating, but of course I'm interested in other alternatives.

Edit: Bonus points if it comes with good error reporting and source code locations attached to syntax tree elements.

like image 651
Stefan Majewsky Avatar asked Jul 04 '11 13:07

Stefan Majewsky


People also ask

Which parser does Python use?

Making experiments. As the generated C parser is the one used by Python, this means that if something goes wrong when adding some new rules to the grammar you cannot correctly compile and execute Python anymore.

What are parsing tools?

A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure, giving a structural representation of the input while checking for correct syntax.


1 Answers

I really like pyPEG. Its error reporting isn't very friendly, but it can add source code locations to the AST.

pyPEG doesn't have a separate lexer, which would make parsing Python itself hard (I think CPython recognises indent and dedent in the lexer), but I've used pyPEG to build a parser for subset of C# with surprisingly little work.

An example adapted from fdik.org/pyPEG/: A simple language like this:

function fak(n) {     if (n==0) { // 0! is 1 by definition         return 1;     } else {         return n * fak(n - 1);     }; } 

A pyPEG parser for that language:

def comment():          return [re.compile(r"//.*"),                                 re.compile("/\*.*?\*/", re.S)] def literal():          return re.compile(r'\d*\.\d*|\d+|".*?"') def symbol():           return re.compile(r"\w+") def operator():         return re.compile(r"\+|\-|\*|\/|\=\=") def operation():        return symbol, operator, [literal, functioncall] def expression():       return [literal, operation, functioncall] def expressionlist():   return expression, -1, (",", expression) def returnstatement():  return keyword("return"), expression def ifstatement():      return (keyword("if"), "(", expression, ")", block,                                 keyword("else"), block) def statement():        return [ifstatement, returnstatement], ";" def block():            return "{", -2, statement, "}" def parameterlist():    return "(", symbol, -1, (",", symbol), ")" def functioncall():     return symbol, "(", expressionlist, ")" def function():         return keyword("function"), symbol, parameterlist, block def simpleLanguage():   return function 
like image 106
Will Harris Avatar answered Oct 22 '22 05:10

Will Harris