Say i have a list of sentences, many of which contain numbers (but not all):
mylist = [
"The current year is 2015 AD.",
"I have 2 dogs."
...
]
I want to know which elements in the list contain a valid year (say, between 1000 and 3000). I know this is a regex issue, and i have found a few posts (e.g., this one) that address detecting digits in strings, but nothing on full years. Any regex wizards out there?
Iterate over the set and use count function (i.e. string. count(newstring[iteration])) to find the frequency of word at each iteration.
Sounds like you are looking for a regex that will find 4 digit numbers where the first digit is between 1 & 3 and the next 3 digits are between 0 and 9 so I think you are looking for something like this
[1-3][0-9]{3}
If you want to accept strings that contain this you could do
.*([1-3][0-9]{3})
Here's a simple solution:
import re
mylist = [] # init the list
for l in mylist:
match = re.match(r'.*([1-3][0-9]{3})', l)
if match is not None:
# Then it found a match!
print match.group(1)
This will check to see if there is a 4 digit number between 1000 and 3999
Well a year can so fare be a lot of things. most commen it is 4 digits long yes, but it is just a number. If you want all years from 1000 and till 9999 you can use this regex: ([1-9][0-9]{3})
but to match the range you need: ([1-2][0-9]{3}|3000)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With