Is there any nice way to validate that all items in a dataframe's column have a valid date format?
My date format is 11-Aug-2010
.
I saw this generic answer, where:
try:
datetime.datetime.strptime(date_text, '%Y-%m-%d')
except ValueError:
raise ValueError("Incorrect data format, should be YYYY-MM-DD")
source: https://stackoverflow.com/a/16870699/1374488
But I assume that's not good (efficient) in my case.
I assume I have to modify the strings to be pandas dates first as mentioned here: Convert string date time to pandas datetime
I am new to the Python world, any ideas appreciated.
(format borrowed from piRSquared's answer)
if pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce').notnull().all():
# do something
This is the LYBL—"Look Before You Leap" approach. This will return True
assuming all your date strings are valid - meaning they are all converted into actual pd.Timestamp
objects. Invalid date strings are coerced to NaT
, which is the datetime equivalent of NaN
.
Alternatively,
try:
pd.to_datetime(df['date'], format='%d-%b-%Y', errors='raise')
# do something
except ValueError:
pass
This is the EAFP—"Easier to Ask Forgiveness than Permission" approach, a ValueError
is raised when invalid date strings are encountered.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With