I am currently analyzing a dateset which contains so many different date types like
12/31/1991
December 10, 1980
September 25, 1970
2005-11-14
December 1990
October 12, 2005
1993-06-26
Is there a way to normalize all the date data into single format 'YYYY-MM-DD' ? I am familiar with datetime package in Python, but what's the best way to approach this problem so that it can handle all the different date types.
If you are okay with using a library, you can use the dateutil
library function, and parse all the dates into datetime objects, and then use datetime.datetime.strftime()
to parse them back into strings in the format you want.
Install dateutil
:
pip3 install python-dateutil
Example:
s = ["12/31/1991",
"December 10, 1980",
"September 25, 1970",
"2005-11-14",
"December 1990",
"October 12, 2005",
"1993-06-26",
"11/20/1967 6:08:15 PM"]
from dateutil import parser
for i in s:
d = parser.parse(i)
print(d.strftime("%Y-%m-%d %H:%M:%S"))
Output:
1991-12-31 00:00:00
1980-12-10 00:00:00
1970-09-25 00:00:00
2005-11-14 00:00:00
1990-12-05 00:00:00
2005-10-12 00:00:00
1993-06-26 00:00:00
1967-11-20 18:08:15
A thing to note, dateutil.parser.parse
would use the current datetime to make up for any parts of the datetime if they are missing in the string (as can be seen above in the parsing of 'December 1990'
, which got parsed as - 1990-12-10
as 10
is the current date).
If a time is not supplied then 00:00:00
is used. See the documentation for how to handle time zones.
I have solved this issue:
from dateutil.parser import parse
dt = parse(str(row))
print(dt.strftime('%Y-%m-%d'))
It is able to handle different date types.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With