To use regex to extract any numbers of length greater than 2, in a string, but also exclude "2016", here is what I have:
import re
string = "Employee ID DF856, Year 2016, Department Finance, Team 2, Location 112 "
print re.findall(r'\d{3,}', string)
output:
['856', '2016', '112']
I tried to change it to below to exclude "2016" but all failed.
print re.findall(r'\d{3,}/^(!2016)/', string)
print re.findall(r"\d{3,}/?!2016/", string)
print re.findall(r"\d{3,}!'2016'", string)
What is the right way to do it? Thank you.
the question was extended, please see the final comment made by Wiktor Stribiżew for the update.
You may use
import re
s = "Employee ID DF856, Year 2016, Department Finance, Team 2, Location 112 20161 12016 120162"
print(re.findall(r'(?<!\d)(?!2016(?!\d))\d{3,}', s))
See the Python demo and a regex demo.
Details
(?<!\d) - no digit allowed iommediately to the left of the current location(?!2016(?!\d)) - no 2016 not followed with another digit is allowed immediately to the right of the current location\d{3,} - 3 or more digits.An alternative solution with some code:
import re
s = "Employee ID DF856, Year 2016, Department Finance, Team 2, Location 112 20161 12016 120162"
print([x for x in re.findall(r'\d{3,}', s) if x != "2016"])
Here, we extract any chunks of 3 or more digits (re.findall(r'\d{3,}', s)) and then filter out those equal to 2016.
You want to use a negative lookahead. The correct syntax is:
\D(?!2016)(\d{3,})\b
Results in:
In [24]: re.findall(r'\D(?!2016)(\d{3,})\b', string)
Out[24]: ['856', '112']
Or using a negative lookbehind:
In [26]: re.findall(r'\D(\d{3,})(?<!2016)\b', string)
Out[26]: ['856', '112']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With