Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for a date range

Tags:

regex

grep

range

If I have a directory structure like this

yyyy/dd/mm/<files>

Is there a way to grep for a string in all files in a given time frame using a regex? For example, I have a time frame: 2010/12/25 - 2011/01/01, I need to grep all files in directories corresponding to dates from 25th december to jan 1st

If I am doing this programmatically, is it better to iterate over the date range and grep files in each yyyy/dd/mm directory than to use a regex to do this? Or would it not make a difference?

like image 766
harithski Avatar asked Jul 27 '11 06:07

harithski


People also ask

How do you do a range in regex?

To show a range of characters, use square backets and separate the starting character from the ending character with a hyphen. For example, [0-9] matches any digit. Several ranges can be put inside square brackets. For example, [A-CX-Z] matches 'A' or 'B' or 'C' or 'X' or 'Y' or 'Z'.

What is the regular expression for date format?

To match a date in mm/dd/yyyy format, rearrange the regular expression to ^(0[1-9]|1[012])[- /.] (0[1-9]|[12][0-9]|3[01])[- /.] (19|20)\d\d$. For dd-mm-yyyy format, use ^(0[1-9]|[12][0-9]|3[01])[- /.]

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What is the meaning of \\ in regular expression?

\\ is technically one backslash, but you gotta type two because it's in a string. It's escaping the . . \\' matches the end of a string, but $ can also match the end of a line. The difference might be relevant if you have newlines in a string.


1 Answers

In your case, it's simple enough:

\b(?:2010/12/(?:3[01]|2[5-9])|2011/01/01)\b

will match a string that contains a date in the range you specified. But generally, regexes are not a good fit for matching date ranges. It's always a possibility, but rarely a good one.

For example, for the range 2003/04/25-2011/04/04, you get

\b(?:
2003/04/(?:30|2[5-9])|
2003/(?:(?:0[69]|11)/(?:30|[12][0-9]|0[1-9])|(?:0[578]|1[02])/(?:3[01]|[12][0-9]|0[1-9]))|
2011/04/0[1-4]|2011/(?:02/(?:[12][0-9]|0[1-9])|0[13]/(?:3[01]|[12][0-9]|0[1-9]))|
(?:2010|200[4-9])/(?:02/(?:[12][0-9]|0[1-9])|(?:0[469]|11)/(?:30|[12][0-9]|0[1-9])|(?:0[13578]|1[02])/(?:3[01]|[12][0-9]|0[1-9]))
)\b

If I had to do something like this (and couldn't use the creation dates in the file attributes), I would either use RegexMagic (to create the date range regex) and PowerGREP (to do the grepping) if it's a one-time job, but these are only available on Windows. If I had to do this more often, I'd write a small Python script that walks through my directory tree, parses the date for each directory, checks if it's in range, and then looks at the files in that directory.

like image 131
Tim Pietzcker Avatar answered Oct 18 '22 10:10

Tim Pietzcker