I want a regex that matches any set of digits, with one possible dot. If there is another dot and more digits after it, do an overlapping match with the previous digits, the dot, and the following digits.
example string = 'aa323aa232.02.03.23.99aa87..0.111111.mm'
desired results = [323, 232.02, 02.03, 03.23, 23.99, 87, 0.111111]
currently using:
import re
i = 'aa323aa232.02.03.23.99aa87..0.111111.mm'
matches = re.findall(r'(?=(\d+\.{0,1}\d+))', i)
print matches
output:
['323', '23', '232.02', '32.02', '2.02', '02.03', '2.03', '03.23', '3.23', '23.99', '3.99', '99', '87', '0.111111', '111111', '11111', '1111', '111', '11']
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
You can use re. escape() : re. escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
findall() module is used to search for “all” occurrences that match a given pattern. In contrast, search() module will only return the first occurrence that matches the specified pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.
This uses a lookahead assertion for capturing, and then another expression for gobbling characters following your rules:
>>> import re
>>> i = 'aa323aa232.02.03.23.99aa87..0.111111.mm'
>>> re.findall(r'(?=(\d+(?:\.\d+)?))\d+(?:\.\d+(?!\.?\d))?', i)
Output
['323', '232.02', '02.03', '03.23', '23.99', '87', '0.111111']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With