Can people point me to resources on lexing, parsing and tokenising with Python?
I'm doing a little hacking on an open source project (hotwire) and wanted to do a few changes to the code that lexes, parses and tokenises the commands entered into it. As it is real working code it is fairly complex and a bit hard to work out.
I haven't worked on code to lex/parse/tokenise before, so I was thinking one approach would be to work through a tutorial or two on this aspect. I would hope to learn enough to navigate around the code I actually want to alter. Is there anything suitable out there? (Ideally it could be done in an afternoon without having to buy and read the dragon book first ...)
Edit: (7 Oct 2008) None of the below answers quite give what I want. With them I could generate parsers from scratch, but I want to learn how to write my own basic parser from scratch, not using lex and yacc or similar tools. Having done that I can then understand the existing code better.
So could someone point me to a tutorial where I can build a basic parser from scratch, using just python?
The lex.py module is used to break input text into a collection of tokens specified by a collection of regular expression rules. yacc.py is used to recognize language syntax that has been specified in the form of a context free grammar. The two tools are meant to work together.
A lexer is a software program that performs lexical analysis. ... A parser goes one level further than thelexer and takes the tokens produced by the lexer and tries to determine if proper sentences have been formed. Parsers work at the grammatical level, lexerswork at the word level.
In this article, parsing is defined as the processing of a piece of python program and converting these codes into machine language. In general, we can say parse is a command for dividing the given program code into a small piece of code for analyzing the correct syntax.
I'm a happy user of PLY. It is a pure-Python implementation of Lex & Yacc, with lots of small niceties that make it quite Pythonic and easy to use. Since Lex & Yacc are the most popular lexing & parsing tools and are used for the most projects, PLY has the advantage of standing on giants' shoulders. A lot of knowledge exists online on Lex & Yacc, and you can freely apply it to PLY.
PLY also has a good documentation page with some simple examples to get you started.
For a listing of lots of Python parsing tools, see this.
This question is pretty old, but maybe my answer would help someone who wants to learn the basics. I find this resource to be very good. It is a simple interpreter written in python without the use of any external libraries. So this will help anyone who would like to understand the internal working of parsing, lexing, and tokenising:
"A Simple Intepreter from Scratch in Python:" Part 1, Part 2, Part 3, and Part 4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With