Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlapping regex

Tags:

python

regex

I want a regex that matches any set of digits, with one possible dot. If there is another dot and more digits after it, do an overlapping match with the previous digits, the dot, and the following digits.
example string = 'aa323aa232.02.03.23.99aa87..0.111111.mm'
desired results = [323, 232.02, 02.03, 03.23, 23.99, 87, 0.111111]

currently using:

import re
i = 'aa323aa232.02.03.23.99aa87..0.111111.mm'
matches = re.findall(r'(?=(\d+\.{0,1}\d+))', i)
print matches  

output:

['323', '23', '232.02', '32.02', '2.02', '02.03', '2.03', '03.23', '3.23', '23.99', '3.99', '99', '87', '0.111111', '111111', '11111', '1111', '111', '11']
like image 436
user193661 Avatar asked Jun 20 '14 21:06

user193661


People also ask

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What is re escape?

You can use re. escape() : re. escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

How do you use Findall in Python?

findall() module is used to search for “all” occurrences that match a given pattern. In contrast, search() module will only return the first occurrence that matches the specified pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.


1 Answers

This uses a lookahead assertion for capturing, and then another expression for gobbling characters following your rules:

>>> import re
>>> i = 'aa323aa232.02.03.23.99aa87..0.111111.mm'
>>> re.findall(r'(?=(\d+(?:\.\d+)?))\d+(?:\.\d+(?!\.?\d))?', i)

Output

['323', '232.02', '02.03', '03.23', '23.99', '87', '0.111111']
like image 98
Miller Avatar answered Oct 14 '22 12:10

Miller