Apologies if this question has been asked already. I thought it would but I have not been able to find an answer. I want to convert a column in pandas dataframe to datetime format
import pandas as pd
df['DateOfBirth'] = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d')
and apparently some rows contain other characters
ValueError: time data 0000-00-00 doesn't match format specified
Now the df is quite large so that visual inspection of all unique values does not work (and I would also like to learn how to do it without looking through all values). I would like to find out all the unique values that do not fit the specified format, so that I can then clean them. Any ideas?
Use to_datetime
with errors='coerce'
, so wrong format generate missing values, then filter original values and convert to unique lists by DataFrame.loc
and Series.unique
:
m = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d', errors='coerce').isna()
print (df.loc[m, 'DateOfBirth'].unique().tolist())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With