Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex, find pattern only in middle of string

Tags:

python

regex

I am using python 2.6 and trying to find a bunch of repeating characters in a string, let's say a bunch of n's, e.g. nnnnnnnABCnnnnnnnnnDEF. In any place of the string the number of n's can be variable.

If I construct a regex like this:

re.findall(r'^(((?i)n)\2{2,})', s),

I can find occurences of case-insensitive n's only in the beginning of the string, which is fine. If I do it like this:

re.findall(r'(((?i)n)\2{2,}$)', s),

I can detect the ones only in the end of the sequence. But what about just in the middle?

At first, I thought of using re.findall(r'(((?i)n)\2{2,})', s) and the two previous regex(-ices?) to check the length of the returned list and the presence of n's either in the beginning or end of the string and make logical tests, but it became an ugly if-else mess very quickly.

Then, I tried re.findall(r'(?!^)(((?i)n)\2{2,})', s), which seems to exlude the beginning just fine but (?!$) or (?!\z) at the end of the regex only excludes the last n in ABCnnnn. Finally, I tried re.findall(r'(?!^)(((?i)n)\2{2,})\w+', s) which seems to work sometimes, but I get weird results at others. It feels like I need a lookahead or lookbehind, but I can't wrap my head around them.

like image 870
Dima1982 Avatar asked Feb 25 '16 09:02

Dima1982


2 Answers

Instead of using a complicated regex in order to refuse of matching the leading and trailing n characters. As a more pythonic approach you can strip() your string then find all the sequence of ns using re.findall() and a simple regex:

>>> s = "nnnABCnnnnDEFnnnnnGHInnnnnn" 
>>> import re
>>> 
>>> re.findall(r'n{2,}', s.strip('n'), re.I)
['nnnn', 'nnnnn']

Note : re.I is Ignore-case flag which makes the regex engine matches upper case and lower case characters.

like image 146
Mazdak Avatar answered Oct 05 '22 22:10

Mazdak


Since "n" is a character (and not a subpattern), you can simply use:

re.findall(r'(?<=[^n])nn+(?=[^n])(?i)', s)

or better:

re.findall(r'n(?<=[^n]n)n+(?=[^n])(?i)', s)
like image 37
Casimir et Hippolyte Avatar answered Oct 05 '22 22:10

Casimir et Hippolyte