I was reading the answer to this question. I can't seem to find the answer to why someone would need a lexer separately
Is it one of the steps a program goes through during compilation? Can someone please explain in simple terms why I would need a lexer, and what purpose it would serve?
The Lexer is responsible for actually processing the component source code and finding the Mason directives within it. It interacts quite closely with the Compiler, which takes the Lexer's output and generates a Mason component object suitable for interpretation at runtime.
Lexer rules allow your parser to match context-free structures on the input character stream as opposed to the much weaker regular structures (using a DFA--deterministic finite automaton).
The function of a lexical analyzer (lexer) is to take an input stream of characters and break it into tokens. JLex is a Java-based lexical analyzer generator based on Lex (C-based). Its output is a Java source program that contains the necessary functions to perform a particular lexical analysis.
A lexer and a parser work in sequence: the lexer scans the input and produces the matching tokens, the parser then scans the tokens and produces the parsing result.
A lexer will take an input character stream and convert it into tokens.
This can be used for a variety of purposes. You could apply transformations to the lexemes for simple text processing and manipulation.
Or the stream of lexemes can be fed to a parser which will convert it into a parser tree.
If the goal is compilation, then lexical analysis is the first step. Think of it as the lower level step which takes characters and converts them into tokens. The parser is a higher level mechanism whose alphabet consists of tokens (created by the lexer), which it parses and creates a parse tree.
if the goal is text manipulation, then manipulation rules can be applied to the lexemes themselves.
A good example is in the Wikipedia http://en.wikipedia.org/wiki/Lexical_analysis.
For example if you want to evaluate the expression "(33+3)*2" the first step is to split the string into tokens "(", "33", "+", "3", ")", "*", "2". As far as I remember my course about compilers this is done by longest-match word-automata.
What is important to know is that you don't need a lexer for parsing.
A lexer is a traditional step used by many compilers to in some respects simplify parsing. But it doesn't always simplify parsing, and in fact it may slow down parsing by the very fact that it creates intermediate objects. Top-down recursive descent parsers or things like PEG parsing expression grammars don't use a lexer and simply parse the whole text straight away.
A lexer can be used to conceptually simplify parsing, but it is not necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With