I would like to know how to match a date like this one "Oct 21, 2014" or "October 21, 2014"
What I have done so far is \b(?:Jan?|?:Feb?|?:Mar?|?:Apr?|?:May?|?:Jun?|?:Jul?|?:Aug?|?:Sep?|?:Oct?|?:Nov?|?:Dec?) [0-9]{1,2}[,] (?:19[7-9]\d|2\d{3})(?=\D|$)
but that doesn't get me anywhere
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
To match a date in mm/dd/yyyy format, rearrange the regular expression to ^(0[1-9]|1[012])[- /.] (0[1-9]|[12][0-9]|3[01])[- /.] (19|20)\d\d$. For dd-mm-yyyy format, use ^(0[1-9]|[12][0-9]|3[01])[- /.]
Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
In the regex flavors discussed in this tutorial, there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), the ...
This may suffice your needs.
Keep in mind however that you will need more sophisticated validations such as validating the number of days for a specific month (say, February can have up to 28 days only (29 in bissext years), and so on)
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\s+(\d{4})
Play with it here.
Again, this is definitely a very simple regex and you must have many better solutions out there, but perhaps this may be enough to your needs, I do not know.
The next could be used for dates with mistakes in month string with python:
"".join((re.compile('(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\.)?(\w*)?(\.)?(\s*\d{0,2}\s*),(\s*\d{4})', re.S + re.I).findall('Some wrong date is Septeme 28, 2002date') + ['n/a'])[0])
Output is:
'Septeme 28 2002'
1 group is a month star:
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
2-4 groups are optional suffixes of a month which could include a dot or alphanumeric characters:
(\.)?(\w*)?(\.)?
It matches .
, t.
tem
in Sep., Sept., Septem
5 group is date number which could be or could not be, so 0 in the expression stands for dates without date number:
(\s*\d{0,2}\s*)
6 group is a year:
(\s*\d{4})
\s*
stands for possible 'empty' characters (spaces, tabs and so on) from 0 to many
[0]
takes the first matching if a few dates tuples in the list
+ ['n/a']
could be added as an additional list element in case if no date matched, so at least 1 element in the list would exist and no 'list index out of range' error appear when [0] element is being taken
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With