LALR grammar, trailing comma and multiline list assignment

Question

I'm trying to produce a LALR grammar for a very simple language composed of assignments. For example:

foo = "bar"
bar = 42

The language should also handle list of values, for example:

foo = 1, 2, 3

But I also want to handle list on multiple lines:

foo = 1, 2
      3, 4

Trailing comma (for singletons and language flexibility):

foo = 1,
foo = 1, 2,

And obviously, both at the same time:

foo = 1,
      2,
      3,

I'm able to write a grammar with trailing comma or multi-line list, but not for both at the same time.

My grammar look like this:

content : content '
'
        : content assignment
        | <empty>

assignment : NAME '=' value
           | NAME '=' list

value : TEXT
      | NUMBER

list : ???

Note: I need the ' ' in the grammar to forbid this kind of code:

foo
=
"bar"

Thanks by advance,

Antoine.

Kaz · Accepted Answer

It looks like your configuration language is essentially free form. I would forget about making newline a token in the grammar. If you want the newline restrictions, you can hack it as some lexical tie-in rules, whereby the parser calls a little API added to the lexer to inform the lexer about where it is in the grammar, and the lexer can decide whether to accept newlines or reject them with an error.

Try this grammar.

%token NAME NUMBER TEXT

%%

config_file : assignments
            | /* empty */
            ;

assignments : assignment
            | assignments assignment
            ;

assignment : NAME '=' values comma_opt

comma_opt : ',' | /* empty */;

values : value
       | values ',' value
       ;

value : NUMBER | TEXT ;

It builds for me with no conflicts. I didn't run it, but a casual reading of y.output looks like the transitions are sane.

This grammar, of course, allows

foo = 1, 2, 3, bar = 4, 5, 6 xyzzy = 7 answer = 42

without additional communication with the lexer.

Your restrictions mean that newlines are only allowed in the values. Two NAME tokens must never appear on the same line, and the = must appear on the same line as the preceding NAME (and probably the first value must also).

Basically when the parser scans the first value, it can tell the lexer "values are being scanned now, turn on the admission of newlines". And then when the comma_opt is reduced, this can be turned off again. When comma_opt is reduced, the lexer may have already read the NAME token of the next assignment, but it can check that this occurs on a different line from the previous NAME. You will want your lexer to keep track of an accurate line count anyway.

LALR grammar, trailing comma and multiline list assignment

Tags:

python

grammar

yacc

ply

Antoine

1 Answers

Kaz

Recent Activity

Donate For Us

LALR grammar, trailing comma and multiline list assignment

Tags:

python

grammar

yacc

ply

Antoine

1 Answers

Kaz

Related questions

Recent Activity

Donate For Us