Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find rows not fitting datetime format in Pandas

Apologies if this question has been asked already. I thought it would but I have not been able to find an answer. I want to convert a column in pandas dataframe to datetime format

import pandas as pd 
df['DateOfBirth'] = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d')

and apparently some rows contain other characters

ValueError: time data 0000-00-00 doesn't match format specified

Now the df is quite large so that visual inspection of all unique values does not work (and I would also like to learn how to do it without looking through all values). I would like to find out all the unique values that do not fit the specified format, so that I can then clean them. Any ideas?

like image 200
Papayapap Avatar asked Oct 19 '25 04:10

Papayapap


1 Answers

Use to_datetime with errors='coerce', so wrong format generate missing values, then filter original values and convert to unique lists by DataFrame.loc and Series.unique:

m = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d', errors='coerce').isna()

print (df.loc[m, 'DateOfBirth'].unique().tolist())
like image 142
jezrael Avatar answered Oct 22 '25 07:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!