Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Problem Group Name Redefinition?

Tags:

python

regex

So I have this regex:

(^(\s+)?(?P<NAME>(\w)(\d{7}))((01f\.foo)|(\.bar|\.goo\.moo\.roo))$|(^(\s+)?(?P<NAME2>R1_\d{6}_\d{6}_)((01f\.foo)|(\.bar|\.goo\.moo\.roo))$))

Now if I try and do a match against this:

B048661501f.foo

I get this error:

  File "C:\Python25\lib\re.py", line 188, in compile
    return _compile(pattern, flags)
  File "C:\Python25\lib\re.py", line 241, in _compile
    raise error, v # invalid expression
sre_constants.error: redefinition of group name 'NAME' as group 9; was group 3

If I can't define the same group twice in the same regex expression for two different cases, what do I do?

like image 937
UberJumper Avatar asked Dec 12 '08 15:12

UberJumper


3 Answers

The following answer deals with how to make the above regex work in Python3.

Since the re2 module as suggested by Max would not work in Python3, because of the NameError: basestring. Another alternative to this is the regex module.

regex module is just an enhanced version of re with extra added features. This module also allows to have same group names in the regex.

You can install it via:

sudo pip install regex

And if you have already been using re or re2 in your program. Just do the following to import regex module

import regex as re
like image 184
Naveen Avatar answered Oct 27 '22 00:10

Naveen


No, you can't have two groups of the same name, this would somehow defy the purpose, wouldn't it?

What you probably really want is this:

^\s*(?P<NAME>\w\d{7}|R1_(?:\d{6}_){2})(01f\.foo|\.(?:bar|goo|moo|roo))$

I refactored your regex as far as possible. I made the following assumptions:

You want to (correct me if I'm wrong):

  • ignore white space at the start of the string
  • match either of the following into a group named "NAME":
    • a letter followed by 7 digits, or
    • "R1_", and two times (6 digits + "_")
  • followed by either:
    • "01f.foo" or
    • "." and ("bar" or "goo" or "moo" or "roo")
  • followed by the end of the string

You could also have meant:

^\s*(?P<NAME>\w\d{7}01f|R1_(?:\d{6}_){2})\.(?:foo|bar|goo|moo|roo)$

Which is:

  • ignore white space at the start of the string
  • match either of the following into a group named "NAME":
    • a letter followed by 7 digits and "01f"
    • "R1_", and two times (6 digits + "_")
  • a dot
  • "foo", "bar", "goo", "moo" or "roo"
  • the end of the string
like image 9
Tomalak Avatar answered Nov 04 '22 05:11

Tomalak


Reusing the same name makes sense in your case, contrary to Tamalak's reply.

Your regex compiles with python2.7 and also re2. Maybe this problem has been resolved.

like image 9
Max Avatar answered Nov 04 '22 04:11

Max