ply lexmatch regular expression has different groups than a usual re

Question

I am using ply and have noticed a strange discrepancy between the token re match stored in t.lex.lexmatch, as compared with an sre_pattern defined in the usual way with the re module. The group(x)'s seem to be off by 1.

I have defined a simple lexer to illustrate the behavior I am seeing:

import ply.lex as lex

tokens = ('CHAR',)

def t_CHAR(t):
    r'.'
    t.value = t.lexer.lexmatch
    return t

l = lex.lex()

(I get a warning about t_error but ignore it for now.) Now I feed some input into the lexer and get a token:

l.input('hello')
l.token()

I get a LexToken(CHAR,<_sre.SRE_Match object at 0x100fb1eb8>,1,0). I want to look a the match object:

m = _.value

So now I look at the groups:

m.group() => 'h' as I expect.

m.group(0) => 'h' as I expect.

m.group(1) => 'h', yet I would expect it to not have such a group.

Compare this to creating such a regular expression manually:

import re
p = re.compile(r'.')
m2 = p.match('hello')

This gives different groups:

m2.group() = 'h' as I expect.

m2.group(0) = 'h' as I expect.

m2.group(1) gives IndexError: no such group as I expect.

Does anyone know why this discrepancy exists?

Andrew Walker · Accepted Answer

In version 3.4 of PLY, the reason this occurs is related to how the expressions are converted from docstrings to patterns.

Looking at the source really does help - line 746 of lex.py:

c = re.compile("(?P<%s>%s)" % (fname,f.__doc__), re.VERBOSE | self.reflags)

I wouldn't recommend relying on something like this between versions - this is just part of the magic of how PLY works.

lolo · Answer

it seems for me that matching group depends on position of the token function in the file, like if groups were actually cumulated through all the declared tokens regexes :

   t_MYTOKEN1(t):
      r'matchit(\w+)'
      t.value = lexer.lexmatch.group(1)
      return t

   t_MYTOKEN2(t):
      r'matchit(\w+)'
      t.value = lexer.lexmatch.group(2)
      return t

ply lexmatch regular expression has different groups than a usual re

Tags:

python

regex

lex

ply

murftown

2 Answers

Andrew Walker

lolo

Recent Activity

Donate For Us

ply lexmatch regular expression has different groups than a usual re

Tags:

python

regex

lex

ply

murftown

2 Answers

Andrew Walker

lolo

Related questions

Recent Activity

Donate For Us