Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where can I learn the basics of writing a lexer?

I want to learn how to write a lexer. My university course had an assignment where we had to write a parser (and a lexer to go along with it) but this was given to us with no instruction or feedback (beyond the mark) so I didn't really learn much from it.

After searching for this topic, I can only find fairly advanced write ups which focus on areas which I feel are a few steps ahead of where I am at. I want a discussion on the basics of writing a lexer for a very simple language which I can use as a basis for investigating tokenising more complex languages.

At this stage I'm not really interested in best practices or optimisation techniques but instead prefer a focus on the essentials. What are some good resources to get me started?

like image 842
Rupert Madden-Abbott Avatar asked Jun 02 '11 15:06

Rupert Madden-Abbott


People also ask

What is a lexer in code?

A lexer defines how the contents of a file is broken into tokens. A lexer is ultimately implemented as finite automata. A lexer reads an input character or byte stream (i.e. characters, binary data, etc.), divides it into tokens using: patterns (rules) specified in a grammar file or in the code.

What is the difference between lexer and parser?

A lexer is a software program that performs lexical analysis. ... A parser goes one level further than thelexer and takes the tokens produced by the lexer and tries to determine if proper sentences have been formed. Parsers work at the grammatical level, lexerswork at the word level.


Video Answer


2 Answers

Basically there are two main approaches to writing a lexer:

  1. Creating a hand-written one in which case I recommend this small tutorial.
  2. Using some lexer generator tools such as lex. In this case, I recommend reading the tutorials to the particular tool of choice.

Also I would like to recommend the Kaleidoscope tutorial from the LLVM documentation. It runs through the implementation of a simple language and in particular demonstrates how to write a small lexer. There is a C++ and an Objective Caml version of the tutorial.

The classical textbook on the subject is Compilers: Principles, Techniques, and Tools also known as the Dragon Book. However this probably falls under the category of "fairly advanced write ups".

like image 194
vitaut Avatar answered Sep 30 '22 22:09

vitaut


The Dragon Book is probably the definitive guide on the subject, although it can be a bit overwhelming. Language Implementation Patterns and Programming Language Pragmatics are great resources as well.

like image 26
Brandon Moretz Avatar answered Sep 30 '22 22:09

Brandon Moretz