Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas validate date format

Is there any nice way to validate that all items in a dataframe's column have a valid date format?

My date format is 11-Aug-2010.

I saw this generic answer, where:

try:
    datetime.datetime.strptime(date_text, '%Y-%m-%d')
except ValueError:
    raise ValueError("Incorrect data format, should be YYYY-MM-DD")

source: https://stackoverflow.com/a/16870699/1374488

But I assume that's not good (efficient) in my case.

I assume I have to modify the strings to be pandas dates first as mentioned here: Convert string date time to pandas datetime

I am new to the Python world, any ideas appreciated.

like image 455
lukas_o Avatar asked Mar 22 '18 17:03

lukas_o


1 Answers

(format borrowed from piRSquared's answer)

if pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce').notnull().all():
    # do something 

This is the LYBL—"Look Before You Leap" approach. This will return True assuming all your date strings are valid - meaning they are all converted into actual pd.Timestamp objects. Invalid date strings are coerced to NaT, which is the datetime equivalent of NaN.

Alternatively,

try:
    pd.to_datetime(df['date'], format='%d-%b-%Y', errors='raise')
    # do something
except ValueError:
    pass

This is the EAFP—"Easier to Ask Forgiveness than Permission" approach, a ValueError is raised when invalid date strings are encountered.

like image 163
cs95 Avatar answered Oct 20 '22 17:10

cs95