Python/YACC Lexer: Token priority?

Question

I'm trying to use reserved words in my grammar:

reserved = {
   'if' : 'IF',
   'then' : 'THEN',
   'else' : 'ELSE',
   'while' : 'WHILE',
}

tokens = [
 'DEPT_CODE',
 'COURSE_NUMBER',
 'OR_CONJ',
 'ID',
] + list(reserved.values())

t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER  = r'[0-9]{4}'
t_OR_CONJ = r'or'

t_ignore = ' 	'

def t_ID(t):
 r'[a-zA-Z_][a-zA-Z_0-9]*'
 if t.value in reserved.values():
  t.type = reserved[t.value]
  return t
 return None

However, the t_ID rule somehow swallows up DEPT_CODE and OR_CONJ. How can I get around this? I'd like those two to take higher precedence than the reserved words.

Nas Banov · Accepted Answer

Mystery Solved!

Ok, i ran into this issue on my own today and looked for solution - did not find it on S/O - but found it in the manual: http://www.dabeaz.com/ply/ply.html#ply_nn6

When building the master regular expression, rules are added in the following order:

All tokens defined by functions are added in the same order as they appear in the lexer file.

Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).

That is why t_ID "beats" the string definitions. A trivial (although brutal) fix will be to simply def t_DEPT_CODE(token): r'[A-Z]{2,}'; return token before def t_ID

Python/YACC Lexer: Token priority?

Tags:

python

parsing

nlp

yacc

Nick Heiner

1 Answers

Mystery Solved!

Nas Banov

Recent Activity

Donate For Us

Python/YACC Lexer: Token priority?

Tags:

python

parsing

nlp

yacc

Nick Heiner

1 Answers

Mystery Solved!

Nas Banov

Related questions

Recent Activity

Donate For Us