I am using dateutil to parse picture filenames and sort them according to date. Since not all my pictures have metadata, dateutil is trying to guess where to put them.
Most of my pictures are in this format: 2007-09-10_0001.jpg 2007-09-10_0002.jpg etc...
fileName = os.path.splitext(file)[0]
print("Guesssing date from ", fileName)
try:
dateString = dateParser.parse(file, fuzzy=True)
print("Guessed date", dateString)
year=dateString.year
month = dateString.month
day=dateString.day
except ValueError:
print("Unable to determine date of ", file)
The return I am getting is this:
('Guesssing date from ', '2007-09-10_00005')
('Unable to determine date of ', '2007-09-10_00005.jpg')
Now I should be able to strip everything from after the underscore, but I wanted a more robust solution if possible, in case I have pictures in another format. I though fuzzy would try and find any date in the string and match to that, but apparently not working...
Is there an easy way to get the parser to find anything that looks like a date and stop after that? If not, what is the easiest way to force the parser to ignore everything after the underscore? Or a way to define multiple date formats with ignore sections.
Thanks!
You can try to "reduce" the string as long as you can't decode it:
from dateutil import parser
def reduce_string(string):
i = len(string) - 1
while string[i] >= '0' and string[i] < '9':
i -= 1
while string[i] < '0' or string[i] > '9':
i -= 1
return string[:i + 1]
def find_date(string):
while string:
try:
dateString = parser.parse(string, fuzzy=True)
year = dateString.year
month = dateString.month
day = dateString.day
return (year, month, day)
except ValueError:
pass
string = reduce_string(string)
return None
date = find_date('2007-09-10_00005')
if date:
print date
else:
print "can't decode"
The idea is to removing the end of the string (any numbers then any non-numbers) until the parser can decode it to a valid date.
Commenting from the future here, as some more insight into this problem.
While dateutil
's fuzzy search is pretty good at picking up dates in normal natural language, it fails at strings like the one above with tons of numeric/symbol related noise. With more recent versions of dateutil
, however, when running:
>>> from dateutil.parser import parse
>>> parse('2007-09-10_00005.jpg', fuzzy=True)
parse
fails with TypeError: 'NoneType' object is not iterable
, which isn't very idiomatic.
Another alternative is simply seeking out the known date format using regex. Of course, this varies by use case, but OP mentioned that his date was always in the format YYYY-MM-DD
, which makes it ideal for a regex search:
from dateutil.parser import parse
import re
date_pattern = re.compile('\d{4}-\d{2}-\d{2}')
def extract_date(filename):
matches = re.match(date_pattern, filename)
if matches:
return parse(matches.group(0))
else:
return None
extract_date('2007-09-10_00005.jpg') # datetime.datetime(2007, 9, 10, 0, 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With