Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of a lexer?

I was reading the answer to this question. I can't seem to find the answer to why someone would need a lexer separately

Is it one of the steps a program goes through during compilation? Can someone please explain in simple terms why I would need a lexer, and what purpose it would serve?

like image 316
Anirudh Ramanathan Avatar asked Jul 07 '12 15:07

Anirudh Ramanathan


People also ask

What is the purpose of the lexer component of a compiler?

The Lexer is responsible for actually processing the component source code and finding the Mason directives within it. It interacts quite closely with the Compiler, which takes the Lexer's output and generates a Mason component object suitable for interpretation at runtime.

What is the benefit of using a lexer before a parser?

Lexer rules allow your parser to match context-free structures on the input character stream as opposed to the much weaker regular structures (using a DFA--deterministic finite automaton).

What does a lexer output?

The function of a lexical analyzer (lexer) is to take an input stream of characters and break it into tokens. JLex is a Java-based lexical analyzer generator based on Lex (C-based). Its output is a Java source program that contains the necessary functions to perform a particular lexical analysis.

What does lexer and parser do?

A lexer and a parser work in sequence: the lexer scans the input and produces the matching tokens, the parser then scans the tokens and produces the parsing result.


3 Answers

A lexer will take an input character stream and convert it into tokens.

This can be used for a variety of purposes. You could apply transformations to the lexemes for simple text processing and manipulation.

Or the stream of lexemes can be fed to a parser which will convert it into a parser tree.

If the goal is compilation, then lexical analysis is the first step. Think of it as the lower level step which takes characters and converts them into tokens. The parser is a higher level mechanism whose alphabet consists of tokens (created by the lexer), which it parses and creates a parse tree.

if the goal is text manipulation, then manipulation rules can be applied to the lexemes themselves.

like image 121
Parag Avatar answered Sep 21 '22 15:09

Parag


A good example is in the Wikipedia http://en.wikipedia.org/wiki/Lexical_analysis.

For example if you want to evaluate the expression "(33+3)*2" the first step is to split the string into tokens "(", "33", "+", "3", ")", "*", "2". As far as I remember my course about compilers this is done by longest-match word-automata.

like image 38
Stasik Avatar answered Sep 22 '22 15:09

Stasik


What is important to know is that you don't need a lexer for parsing.

A lexer is a traditional step used by many compilers to in some respects simplify parsing. But it doesn't always simplify parsing, and in fact it may slow down parsing by the very fact that it creates intermediate objects. Top-down recursive descent parsers or things like PEG parsing expression grammars don't use a lexer and simply parse the whole text straight away.

A lexer can be used to conceptually simplify parsing, but it is not necessary.

like image 27
Lance Avatar answered Sep 19 '22 15:09

Lance