I am using python 2.6 and trying to find a bunch of repeating characters in a string, let's say a bunch of n
's, e.g. nnnnnnnABCnnnnnnnnnDEF
. In any place of the string the number of n
's can be variable.
If I construct a regex like this:
re.findall(r'^(((?i)n)\2{2,})', s)
,
I can find occurences of case-insensitive n
's only in the beginning of the string, which is fine. If I do it like this:
re.findall(r'(((?i)n)\2{2,}$)', s)
,
I can detect the ones only in the end of the sequence. But what about just in the middle?
At first, I thought of using re.findall(r'(((?i)n)\2{2,})', s)
and the two previous regex(-ices?) to check the length of the returned list and the presence of n
's either in the beginning or end of the string and make logical tests, but it became an ugly if-else mess very quickly.
Then, I tried re.findall(r'(?!^)(((?i)n)\2{2,})', s)
, which seems to exlude the beginning just fine but (?!$)
or (?!\z)
at the end of the regex only excludes the last n
in ABCnnnn
. Finally, I tried re.findall(r'(?!^)(((?i)n)\2{2,})\w+', s)
which seems to work sometimes, but I get weird results at others. It feels like I need a lookahead or lookbehind, but I can't wrap my head around them.
Instead of using a complicated regex in order to refuse of matching the leading and trailing n
characters. As a more pythonic approach you can strip()
your string then find all the sequence of n
s using re.findall()
and a simple regex:
>>> s = "nnnABCnnnnDEFnnnnnGHInnnnnn"
>>> import re
>>>
>>> re.findall(r'n{2,}', s.strip('n'), re.I)
['nnnn', 'nnnnn']
Note : re.I
is Ignore-case flag which makes the regex engine matches upper case and lower case characters.
Since "n" is a character (and not a subpattern), you can simply use:
re.findall(r'(?<=[^n])nn+(?=[^n])(?i)', s)
or better:
re.findall(r'n(?<=[^n]n)n+(?=[^n])(?i)', s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With