I've written an interpreter for a C-like language, using Flex and Bison for the scanner/parser. It's working fine when executing full program files.
Now I'm trying implement a REPL in the interpreter for interactive use. I want it to work like the command line interpreters in Ruby or ML:
My grammar starts with a top_level
production, which represents a single statement in the language. The lexer is configured for interactive mode on stdin. I am using the same scanner and grammar in both full-file and REPL modes, because there's no semantic difference in the two interfaces.
My main evaluation loop is structured like this.
while (!interpreter.done) {
if (interpreter.repl)
printf(prompt);
int status = yyparse(interpreter);
if (status) {
if (interpreter.error)
report_error(interpreter);
}
else {
if (interpreter.repl)
puts(interpreter.result);
}
}
This works fine except for the prompt and echo logic. If the user enters multiple statements on a line, this loop prints out superfluous prompts and expressions. And if the expression continues on multiple lines, this code doesn't print out continuation prompts. These problems occur because the granularity of the prompt/echo logic is a top_level
statement in the grammar, but the line-reading logic is deep in the lexer.
What's the best way to restructure the evaluation loop to handle the REPL prompting and echoing? That is:
(I'd rather not change the scanner language to pass newline tokens, since that will severely alter the grammar. Modifying YY_INPUT
and adding a few actions to the Bison grammar would be fine. Also, I'm using the stock Flex 2.5.35 and Bison 2.3 that ship with Xcode.)
Flex and Bison are tools for building programs that handle structured input. They were originally tools for building compilers, but they have proven to be useful in many other areas.
Bison is the GNU implementation/extension of Yacc, Flex is the successor of Lex. In either case, it's fine (and recommended) to use bison / flex. Additionally, byacc, the Berkeley implementation of yacc, is widely available (I see it in my Debian repository list).
Lex and Yacc were the first popular and efficient lexers and parsers generators, flex and Bison were the first widespread open-source versions compatible with the original software. Each of these software has more than 30 years of history, which is an achievement in itself.
Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR (1) parser tables. As an experimental feature, Bison can also generate IELR (1) or canonical LR(1) parser tables.
After looking at how languages like Python and SML/NJ handle their REPLs, I got a nice one working in my interpreter. Instead of having the prompt/echo logic in the outermost parser driver loop, I put it in the innermost lexer input routine. Actions in the parser and lexer set flags that control the prompting by input routine.
I'm using a reentrant scanner, so yyextra
contains the state passed between the layers of the interpreter. It looks roughly like this:
typedef struct Interpreter {
char* ps1; // prompt to start statement
char* ps2; // prompt to continue statement
char* echo; // result of last statement to display
BOOL eof; // set by the EOF action in the parser
char* error; // set by the error action in the parser
BOOL completeLine // managed by yyread
BOOL atStart; // true before scanner sees printable chars on line
// ... and various other fields needed by the interpreter
} Interpreter;
The lexer input routine:
size_t yyread(FILE* file, char* buf, size_t max, Interpreter* interpreter)
{
// Interactive input is signaled by yyin==NULL.
if (file == NULL) {
if (interpreter->completeLine) {
if (interpreter->atStart && interpreter->echo != NULL) {
fputs(interpreter->echo, stdout);
fputs("\n", stdout);
free(interpreter->echo);
interpreter->echo = NULL;
}
fputs(interpreter->atStart ? interpreter->ps1 : interpreter->ps2, stdout);
fflush(stdout);
}
char ibuf[max+1]; // fgets needs an extra byte for \0
size_t len = 0;
if (fgets(ibuf, max+1, stdin)) {
len = strlen(ibuf);
memcpy(buf, ibuf, len);
// Show the prompt next time if we've read a full line.
interpreter->completeLine = (ibuf[len-1] == '\n');
}
else if (ferror(stdin)) {
// TODO: propagate error value
}
return len;
}
else { // not interactive
size_t len = fread(buf, 1, max, file);
if (len == 0 && ferror(file)) {
// TODO: propagate error value
}
return len;
}
}
The top level interpreter loop becomes:
while (!interpreter->eof) {
interpreter->atStart = YES;
int status = yyparse(interpreter);
if (status) {
if (interpreter->error)
report_error(interpreter);
}
else {
exec_statement(interpreter);
if (interactive)
interpreter->echo = result_string(interpreter);
}
}
The Flex file gets these new definitions:
%option extra-type="Interpreter*"
#define YY_INPUT(buf, result, max_size) result = yyread(yyin, buf, max_size, yyextra)
#define YY_USER_ACTION if (!isspace(*yytext)) { yyextra->atStart = NO; }
The YY_USER_ACTION
handles the tricky interplay between tokens in the language grammar and lines of input. My language is like C and ML in that a special character (';') is required to end a statement. In the input stream, that character can either be followed by a newline character to signal end-of-line, or it can be followed by characters that are part of a new statement. The input routine needs to show the main prompt if the only characters scanned since the last end-of-statement are newlines or other whitespace; otherwise it should show the continuation prompt.
I too am working on such an interpreter, I haven't gotten to the point of making a REPL yet, so my discussion might be somewhat vague.
Is it acceptable if given a sequence of statements on a single line, only the result of the last expression is printed? Because you can re-factor your top level grammar rule like so:
top_level = top_level statement | statement ;
The output of your top_level then could be a linked list of statements, and interpreter.result would be the evaluation of the tail of this list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With