Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: how to find consecutive pairs of letters by regex?

Tags:

python

regex

I want to find words that have consecutive letter pairs using regex. I know for just one consecutive pair like zoo (oo), puzzle (zz), arrange (rr), it can be achieved by '(\w){2}'. But how about

  • two consecutive pairs: committee (ttee)
  • three consecutive pairs: bookkeeper (ookkee)

edit:

  • '(\w){2}' is actually wrong, it finds any two letters instead of a double letter pair.
  • My intention is to find the words that have letter pairs, not the pairs.
  • By 'consecutive', I mean there is no other letter between letter pairs.
like image 280
Skiptomylu Avatar asked Jul 10 '13 00:07

Skiptomylu


1 Answers

Use re.finditer

>>> [m.group() for m in re.finditer(r'((\w)\2)+', 'zoo')]
['oo']
>>> [m.group() for m in re.finditer(r'((\w)\2)+', 'arrange')]
['rr']
>>> [m.group() for m in re.finditer(r'((\w)\2)+', 'committee')]
['mm', 'ttee']
>>> [m.group() for m in re.finditer(r'((\w)\2)+', 'bookkeeper')]
['ookkee']

Check whether the string contain consecutive pair:

>>> bool(re.search(r'((\w)\2){2}', 'zoo'))
False
>>> bool(re.search(r'((\w)\2){2}', 'arrange'))
False
>>> bool(re.search(r'((\w)\2){2}', 'committee'))
True
>>> bool(re.search(r'((\w)\2){2}', 'bookkeeper'))
True

You can also use following non-capturing (?:) version:

(?:(\w)\1){2}
like image 181
falsetru Avatar answered Sep 27 '22 18:09

falsetru