I want to parse dates like these into a datetime object:
The following will work for the first date:
datetime.strptime("December 12th, 2008", "%B %dth, %Y")
but will fail for the second because of the suffix to the day number ('st'). So, is there an undocumented wildcard character in strptime? Or a better approach altogether?
Python time strptime() function The strptime() function in Python is used to format and return a string representation of date and time. It takes in the date, time, or both as an input, and parses it according to the directives given to it.
Description. Python time method strptime() parses a string representing a time according to a format. The return value is a struct_time as returned by gmtime() or localtime().
strptime is short for "parse time" where strftime is for "formatting time". That is, strptime is the opposite of strftime though they use, conveniently, the same formatting specification.
Types of wildcardsAn asterisk ∗ is used to specify any number of characters. It is typically used at the end of a root word. This is great when you want to search for variable endings of a root word. For example, searching for work* would tell the database to look for all possible word-endings to the root “work”.
Try using the dateutil.parser module.
import dateutil.parser
date1 = dateutil.parser.parse("December 12th, 2008")
date2 = dateutil.parser.parse("January 1st, 2009")
Additional documentation can be found here: http://labix.org/python-dateutil
You need Gustavo Niemeyer's python_dateutil -- once it's installed,
>>> from dateutil import parser
>>> parser.parse('December 12th, 2008')
datetime.datetime(2008, 12, 12, 0, 0)
>>> parser.parse('January 1st, 2009')
datetime.datetime(2009, 1, 1, 0, 0)
>>>
strptime is tricky because it relies on the underlying C library for its implementation, so some details differ between platforms. There doesn't seem to be a way to match the characters you need to. But you could clean the data first:
# Remove ordinal suffixes from numbers.
date_in = re.sub(r"(st|nd|rd|th),", ",", date_in)
# Parse the pure date.
date = datetime.strptime(date_in, "%B %d, %Y")
If you want to use arbitrary wildcards, you can use datetime-glob, a module we developed to parse date/times from a list of files generated by a consistent date/time formatting. From the module's documentation:
>>> import datetime_glob
>>> matcher = datetime_glob.Matcher(
pattern='/some/path/*%Y-%m-%dT%H-%M-%SZ.jpg')
>>> matcher.match(path='/some/path/some-text2016-07-03T21-22-23Z.jpg')
datetime_glob.Match(year = 2016, month = 7, day = 3,
hour = 21, minute = 22, second = 23, microsecond = None)
>>> match.as_datetime()
datetime.datetime(2016, 7, 3, 21, 22, 23)
For anyone who, like me, just want something that "works" without an additional module, this is a quick and dirty solution.
string_list = ["th", "rd", "nd", "st"]
time = None
for str in string_list:
if (time is not None):
break
try:
match_string = '%B %d' + str +', %Y'
time = datetime.strptime("December 12th, 2008", match_string)
except Exception:
pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With