Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Refer to group inside group with Regex

Tags:

python

regex

I am trying to find a regex that groups a word that ends on two identical symbols followed by 'ter' and splits it on the two symbols. Example: The word 'Letter' should be grouped into 'Let' and 'ter'. I'm using python and this is what i've gotten so far:

match = re.search(r'(\w*)((\w)\1(er$))', str)
print match.group(1) #should print 'Let'
print match.group(2) #should print 'ter'

The problem is that the (\w)\1 doesn't refer to the right group, because it's a group inside a group. How is this solved?

Thanks in advance.

like image 539
Swen Mulderij Avatar asked May 11 '13 11:05

Swen Mulderij


1 Answers

I'm using named groups as that makes referencing them easier:

import re
pattern = r"""
          \b(?P<first_part>\w*(?P<splitter>\w))   # matches starting at a word boundary
          (?P<last_part>(?P=splitter)er\b)        # matches the last letter of the first group
                                                  # plus 'er' if followed by a word boundary
          """
matcher = re.compile(pattern, re.X)
print matcher.search('letter').groupdict()
# out: {'first_part': 'let', 'last_part': 'ter', 'splitter': 't'}
like image 73
Thomas Fenzl Avatar answered Oct 07 '22 00:10

Thomas Fenzl