Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match date like month name day comma and year

Tags:

regex

I would like to know how to match a date like this one "Oct 21, 2014" or "October 21, 2014"

What I have done so far is \b(?:Jan?|?:Feb?|?:Mar?|?:Apr?|?:May?|?:Jun?|?:Jul?|?:Aug?|?:Sep?|?:Oct?|?:Nov?|?:Dec?) [0-9]{1,2}[,] (?:19[7-9]\d|2\d{3})(?=\D|$) but that doesn't get me anywhere

  • On short I need my matching string to be: "Month[space]Day[comma][space]Year" and I don't care about leap years and days of month should be anything between 1 and 31 with no leading 0
  • I need this regex to work on python
like image 296
faceoff Avatar asked Feb 15 '16 15:02

faceoff


People also ask

How do I match a regex pattern?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How do I format a date in regex?

To match a date in mm/dd/yyyy format, rearrange the regular expression to ^(0[1-9]|1[012])[- /.] (0[1-9]|[12][0-9]|3[01])[- /.] (19|20)\d\d$. For dd-mm-yyyy format, use ^(0[1-9]|[12][0-9]|3[01])[- /.]

What does regex (? S match?

Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.

Is period a special character in regex?

In the regex flavors discussed in this tutorial, there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), the ...


2 Answers

This may suffice your needs.

Keep in mind however that you will need more sophisticated validations such as validating the number of days for a specific month (say, February can have up to 28 days only (29 in bissext years), and so on)

(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\s+(\d{4})

Play with it here.

Again, this is definitely a very simple regex and you must have many better solutions out there, but perhaps this may be enough to your needs, I do not know.

like image 104
Veverke Avatar answered Sep 28 '22 04:09

Veverke


The next could be used for dates with mistakes in month string with python:

"".join((re.compile('(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\.)?(\w*)?(\.)?(\s*\d{0,2}\s*),(\s*\d{4})', re.S + re.I).findall('Some wrong date is Septeme 28, 2002date') + ['n/a'])[0])

Output is:

'Septeme 28 2002'

1 group is a month star:

(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)

2-4 groups are optional suffixes of a month which could include a dot or alphanumeric characters:

(\.)?(\w*)?(\.)?

It matches ., t. tem in Sep., Sept., Septem

5 group is date number which could be or could not be, so 0 in the expression stands for dates without date number:

(\s*\d{0,2}\s*)

6 group is a year:

(\s*\d{4})

\s* stands for possible 'empty' characters (spaces, tabs and so on) from 0 to many

[0] takes the first matching if a few dates tuples in the list

+ ['n/a'] could be added as an additional list element in case if no date matched, so at least 1 element in the list would exist and no 'list index out of range' error appear when [0] element is being taken

like image 44
Gryu Avatar answered Sep 28 '22 05:09

Gryu