I want to write a compiler for a language that denotes program blocks with white spaces, like in Python. I prefer to do this in Python, but C++ is also an option. Is there an open-source lexer that can help me do this easily, for example by generating INDENT and DEDENT identifiers properly like the Python lexer does? A corresponding parser generator will be a plus.
LEPL is pure Python and supports offside parsing.
If you're using something like lex, you can do it this way:
^[ \t]+ { int new_indent = count_indent(yytext);
if (new_indent > current_indent) {
current_indent = new_indent;
return INDENT;
} else if (new_indent < current_indent) {
current_indent = new_indent;
return DEDENT;
}
/* Else do nothing, and this way
you can essentially treat INDENT and DEDENT
as opening and closing braces. */
}
You may need a little additional logic, for example to ignore blank lines, and to automatically add a DEDENT at the end of the file if needed.
Presumably count_indent would take into account converting tabs to spaces according to a tab-stop value.
I don't know about lexer/parser generators for Python, but what I posted should work with lex/flex, and you can hook it up to yacc/bison to create a parser. You could use C or C++ with those.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With