Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find "one letter that appears twice" in a string

I'm trying to catch if one letter that appears twice in a string using RegEx (or maybe there's some better ways?), for example my string is:

ugknbfddgicrmopn

The output would be:

dd

However, I've tried something like:

re.findall('[a-z]{2}', 'ugknbfddgicrmopn')

but in this case, it returns:

['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn']   # the except output is `['dd']`

I also have a way to get the expect output:

>>> l = []
>>> tmp = None
>>> for i in 'ugknbfddgicrmopn':
...     if tmp != i:
...         tmp = i
...         continue
...     l.append(i*2)
...     
... 
>>> l
['dd']
>>> 

But that's too complex...


If it's 'abbbcppq', then only catch:

abbbcppq
 ^^  ^^

So the output is:

['bb', 'pp']

Then, if it's 'abbbbcppq', catch bb twice:

abbbbcppq
 ^^^^ ^^

So the output is:

['bb', 'bb', 'pp']
like image 325
Remi Crystal Avatar asked Dec 14 '15 07:12

Remi Crystal


4 Answers

You need use capturing group based regex and define your regex as raw string.

>>> re.search(r'([a-z])\1', 'ugknbfddgicrmopn').group()
'dd'
>>> [i+i for i in re.findall(r'([a-z])\1', 'abbbbcppq')]
['bb', 'bb', 'pp']

or

>>> [i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]
['bb', 'bb', 'pp']

Note that , re.findall here should return the list of tuples with the characters which are matched by the first group as first element and the second group as second element. For our case chars within first group would be enough so I mentioned i[0].

like image 125
Avinash Raj Avatar answered Nov 13 '22 09:11

Avinash Raj


As a Pythonic way You can use zip function within a list comprehension:

>>> s = 'abbbcppq'
>>>
>>> [i+j for i,j in zip(s,s[1:]) if i==j]
['bb', 'bb', 'pp']

If you are dealing with large string you can use iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterator, then by calling the next function on second iterator consume the first item and use call the zip class (in Python 2.X use itertools.izip() which returns an iterator) with this iterators.

>>> from itertools import tee
>>> first = iter(s)
>>> second, first = tee(first)
>>> next(second)
'a'
>>> [i+j for i,j in zip(first,second) if i==j]
['bb', 'bb', 'pp']

Benchmark with RegEx recipe:

# ZIP
~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]"
1000000 loops, best of 3: 1.56 usec per loop

# REGEX
~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]"
100000 loops, best of 3: 3.21 usec per loop

After your last edit as mentioned in comment if you want to only match one pair of b in strings like "abbbcppq" you can use finditer() which returns an iterator of matched objects, and extract the result with group() method:

>>> import re
>>> 
>>> s = "abbbcppq"
>>> [item.group(0) for item in re.finditer(r'([a-z])\1',s,re.I)]
['bb', 'pp']

Note that re.I is the IGNORECASE flag which makes the RegEx match the uppercase letters too.

like image 23
Mazdak Avatar answered Nov 13 '22 11:11

Mazdak


Using back reference, it is very easy:

import re
p = re.compile(ur'([a-z])\1{1,}')
re.findall(p, u"ugknbfddgicrmopn")
#output: [u'd']
re.findall(p,"abbbcppq")
#output: ['b', 'p']

For more details, you can refer to a similar question in perl: Regular expression to match any character being repeated more than 10 times

like image 9
Gurupad Hegde Avatar answered Nov 13 '22 10:11

Gurupad Hegde


It is pretty easy without regular expressions:

In [4]: [k for k, v in collections.Counter("abracadabra").items() if v==2]
Out[4]: ['b', 'r']
like image 5
Dima Tisnek Avatar answered Nov 13 '22 09:11

Dima Tisnek