Has anyone got a simple example of how to define a grammar that parses python-like indentation for blocks using Jison?
I created a language using Jison which uses python-style indentation. It's an automated white-box algorithm testing language called Bianca.
Bianca only has two dependencies - one is Jison and the other one is Lexer. Jison supports custom scanners and Lexer is one such scanner.
In C-style programming languages blocks of code are delimited by curly braces. In python-style indentation however you have INDENT
and DEDENT
tokens.
Writing a rule to generate INDENT
and DEDENT
tokens in Lexer is brain-dead simple. In fact the Lexer documentation shows precisely how to do it.
This snippet of code is taken directly from the source code of Bianca (lexer.js):
var indent = [0];
lexer.addRule(/^ */gm, function (lexeme) {
var indentation = lexeme.length;
col += indentation;
if (indentation > indent[0]) {
indent.unshift(indentation);
return "INDENT";
}
var tokens = [];
while (indentation < indent[0]) {
tokens.push("DEDENT");
indent.shift();
}
if (tokens.length) return tokens;
});
A brief explanation of how this code works can be found in the Python documentation:
Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line's indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one
INDENT
token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off aDEDENT
token is generated. At the end of the file, aDEDENT
token is generated for each number remaining on the stack that is larger than zero.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With