I'm trying to use reserved words in my grammar:
reserved = {
'if' : 'IF',
'then' : 'THEN',
'else' : 'ELSE',
'while' : 'WHILE',
}
tokens = [
'DEPT_CODE',
'COURSE_NUMBER',
'OR_CONJ',
'ID',
] + list(reserved.values())
t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER = r'[0-9]{4}'
t_OR_CONJ = r'or'
t_ignore = ' \t'
def t_ID(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
if t.value in reserved.values():
t.type = reserved[t.value]
return t
return None
However, the t_ID rule somehow swallows up DEPT_CODE and OR_CONJ. How can I get around this? I'd like those two to take higher precedence than the reserved words.
Ok, i ran into this issue on my own today and looked for solution - did not find it on S/O - but found it in the manual: http://www.dabeaz.com/ply/ply.html#ply_nn6
When building the master regular expression, rules are added in the following order:
- All tokens defined by functions are added in the same order as they appear in the lexer file.
- Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).
That is why t_ID "beats" the string definitions. A trivial (although brutal) fix will be to
simply def t_DEPT_CODE(token): r'[A-Z]{2,}'; return token
before def t_ID
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With