Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lexer that recognizes indented blocks [duplicate]

I want to write a compiler for a language that denotes program blocks with white spaces, like in Python. I prefer to do this in Python, but C++ is also an option. Is there an open-source lexer that can help me do this easily, for example by generating INDENT and DEDENT identifiers properly like the Python lexer does? A corresponding parser generator will be a plus.

like image 906
Elektito Avatar asked Aug 01 '11 19:08

Elektito


2 Answers

LEPL is pure Python and supports offside parsing.

like image 171
Cat Plus Plus Avatar answered Sep 28 '22 06:09

Cat Plus Plus


If you're using something like lex, you can do it this way:

^[ \t]+              { int new_indent = count_indent(yytext);
                       if (new_indent > current_indent) {
                          current_indent = new_indent;
                          return INDENT;
                       } else if (new_indent < current_indent) {
                          current_indent = new_indent;
                          return DEDENT;
                       }
                       /* Else do nothing, and this way
                          you can essentially treat INDENT and DEDENT
                          as opening and closing braces. */
                     }

You may need a little additional logic, for example to ignore blank lines, and to automatically add a DEDENT at the end of the file if needed.

Presumably count_indent would take into account converting tabs to spaces according to a tab-stop value.

I don't know about lexer/parser generators for Python, but what I posted should work with lex/flex, and you can hook it up to yacc/bison to create a parser. You could use C or C++ with those.

like image 33
parkovski Avatar answered Sep 28 '22 07:09

parkovski