Find "one letter that appears twice" in a string

Question

I'm trying to catch if one letter that appears twice in a string using RegEx (or maybe there's some better ways?), for example my string is:

ugknbfddgicrmopn

The output would be:

dd

However, I've tried something like:

re.findall('[a-z]{2}', 'ugknbfddgicrmopn')

but in this case, it returns:

['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn']   # the except output is `['dd']`

I also have a way to get the expect output:

>>> l = []
>>> tmp = None
>>> for i in 'ugknbfddgicrmopn':
...     if tmp != i:
...         tmp = i
...         continue
...     l.append(i*2)
...     
... 
>>> l
['dd']
>>>

But that's too complex...

If it's 'abbbcppq', then only catch:

abbbcppq
 ^^  ^^

So the output is:

['bb', 'pp']

Then, if it's 'abbbbcppq', catch bb twice:

abbbbcppq
 ^^^^ ^^

So the output is:

['bb', 'bb', 'pp']

Avinash Raj · Accepted Answer

You need use capturing group based regex and define your regex as raw string.

>>> re.search(r'([a-z])\1', 'ugknbfddgicrmopn').group()
'dd'
>>> [i+i for i in re.findall(r'([a-z])\1', 'abbbbcppq')]
['bb', 'bb', 'pp']

or

>>> [i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]
['bb', 'bb', 'pp']

Note that , re.findall here should return the list of tuples with the characters which are matched by the first group as first element and the second group as second element. For our case chars within first group would be enough so I mentioned i[0].

Mazdak · Answer

As a Pythonic way You can use zip function within a list comprehension:

>>> s = 'abbbcppq'
>>>
>>> [i+j for i,j in zip(s,s[1:]) if i==j]
['bb', 'bb', 'pp']

If you are dealing with large string you can use iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterator, then by calling the next function on second iterator consume the first item and use call the zip class (in Python 2.X use itertools.izip() which returns an iterator) with this iterators.

>>> from itertools import tee
>>> first = iter(s)
>>> second, first = tee(first)
>>> next(second)
'a'
>>> [i+j for i,j in zip(first,second) if i==j]
['bb', 'bb', 'pp']

Benchmark with `RegEx` recipe:

# ZIP
~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]"
1000000 loops, best of 3: 1.56 usec per loop

# REGEX
~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]"
100000 loops, best of 3: 3.21 usec per loop

After your last edit as mentioned in comment if you want to only match one pair of b in strings like "abbbcppq" you can use finditer() which returns an iterator of matched objects, and extract the result with group() method:

>>> import re
>>> 
>>> s = "abbbcppq"
>>> [item.group(0) for item in re.finditer(r'([a-z])\1',s,re.I)]
['bb', 'pp']

Note that re.I is the IGNORECASE flag which makes the RegEx match the uppercase letters too.

Gurupad Hegde · Answer

Using back reference, it is very easy:

import re
p = re.compile(ur'([a-z])\1{1,}')
re.findall(p, u"ugknbfddgicrmopn")
#output: [u'd']
re.findall(p,"abbbcppq")
#output: ['b', 'p']

For more details, you can refer to a similar question in perl: Regular expression to match any character being repeated more than 10 times

Dima Tisnek · Answer

It is pretty easy without regular expressions:

In [4]: [k for k, v in collections.Counter("abracadabra").items() if v==2]
Out[4]: ['b', 'r']

Find "one letter that appears twice" in a string

Tags:

python

regex

python-2.7

Remi Crystal

4 Answers

Avinash Raj

Benchmark with `RegEx` recipe:

Mazdak

Gurupad Hegde

Dima Tisnek

Recent Activity

Donate For Us

Find "one letter that appears twice" in a string

Tags:

python

regex

python-2.7

Remi Crystal

4 Answers

Avinash Raj

Benchmark with RegEx recipe:

Mazdak

Gurupad Hegde

Dima Tisnek

Related questions

Recent Activity

Donate For Us

Benchmark with `RegEx` recipe: