Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex string and substring

I have a character string 'aabaacaba'. Starting from left, I am trying to get substrings of all sizes >=2, which appear later in the string. For instance, aa appears again in the string and so is the case with ab.

I wrote following regex code:

re.findall(r'([a-z]{2,})(?:[a-z]*)(?:\1)', 'aabaacaba')

and I get ['aa'] as answer. Regular expression misses ab pattern. I think this is because of overlapping characters. Please suggest a solution, so that the expression could be fixed. Thank you.

like image 954
Sumit Avatar asked May 14 '17 02:05

Sumit


People also ask

How do you check if a string contains a substring using regex?

You can simply use DEF as your regexp. To identify strings that don't contain it, simply return the strings that don't match the above expression.

How do you extract a substring from a string in Python regex?

Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter. \d matches a digit character, and + matches one or more repetitions of the preceding pattern.

How do you get a string before a specific substring?

Use the substring() method to get the substring before a specific character, e.g. const before = str. substring(0, str. indexOf('_')); . The substring method will return a new string containing the part of the string before the specified character.


1 Answers

You can use look-ahead assertion which does not consume matched string:

>>> re.findall(r'(?=([a-z]{2,})(?=.*\1))', 'aabaacaba')
['aa', 'aba', 'ba']

NOTE: aba matched instead of ab. (trying to match as long as possible)

like image 92
falsetru Avatar answered Sep 17 '22 13:09

falsetru