If I have a directory structure like this
yyyy/dd/mm/<files>
Is there a way to grep for a string in all files in a given time frame using a regex? For example, I have a time frame: 2010/12/25 - 2011/01/01, I need to grep all files in directories corresponding to dates from 25th december to jan 1st
If I am doing this programmatically, is it better to iterate over the date range and grep files in each yyyy/dd/mm directory than to use a regex to do this? Or would it not make a difference?
To show a range of characters, use square backets and separate the starting character from the ending character with a hyphen. For example, [0-9] matches any digit. Several ranges can be put inside square brackets. For example, [A-CX-Z] matches 'A' or 'B' or 'C' or 'X' or 'Y' or 'Z'.
To match a date in mm/dd/yyyy format, rearrange the regular expression to ^(0[1-9]|1[012])[- /.] (0[1-9]|[12][0-9]|3[01])[- /.] (19|20)\d\d$. For dd-mm-yyyy format, use ^(0[1-9]|[12][0-9]|3[01])[- /.]
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
\\ is technically one backslash, but you gotta type two because it's in a string. It's escaping the . . \\' matches the end of a string, but $ can also match the end of a line. The difference might be relevant if you have newlines in a string.
In your case, it's simple enough:
\b(?:2010/12/(?:3[01]|2[5-9])|2011/01/01)\b
will match a string that contains a date in the range you specified. But generally, regexes are not a good fit for matching date ranges. It's always a possibility, but rarely a good one.
For example, for the range 2003/04/25-2011/04/04, you get
\b(?:
2003/04/(?:30|2[5-9])|
2003/(?:(?:0[69]|11)/(?:30|[12][0-9]|0[1-9])|(?:0[578]|1[02])/(?:3[01]|[12][0-9]|0[1-9]))|
2011/04/0[1-4]|2011/(?:02/(?:[12][0-9]|0[1-9])|0[13]/(?:3[01]|[12][0-9]|0[1-9]))|
(?:2010|200[4-9])/(?:02/(?:[12][0-9]|0[1-9])|(?:0[469]|11)/(?:30|[12][0-9]|0[1-9])|(?:0[13578]|1[02])/(?:3[01]|[12][0-9]|0[1-9]))
)\b
If I had to do something like this (and couldn't use the creation dates in the file attributes), I would either use RegexMagic (to create the date range regex) and PowerGREP (to do the grepping) if it's a one-time job, but these are only available on Windows. If I had to do this more often, I'd write a small Python script that walks through my directory tree, parses the date for each directory, checks if it's in range, and then looks at the files in that directory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With