Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling Pandas dataframe columns with mixed date formats

I am have imported a CSV file which has mixed data formats - some date formats recognized by read_csv, plus some Excel serial-datetime format (eg 41,866.321).

Once the data is imported, the column type is shown as object (given the different types of data) and the dates (both types of formats) have dtype string.

I would like to use the to_datetime method to convert the recognized string date formats into datetimes in the dataframe column, leaving the unrecognized strings in excel format which I can then isolate and correct off line. But unless I apply the method row by row (way too slow), it fails to do this.

Does anyone have a cleverer way of solving this?

Update: having tinkered around some more I have found this solution, using coerce = True to force the column datatype conversion, and then identifying null values which I can cross reference back to the original file. But if there is a better way to do this (eg fixing the unrecognized time stamps in place) please let me know.

df1['DateTime']=pd.to_datetime(df1['Time_Date'],coerce=True)
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]
like image 998
Will H Avatar asked Nov 14 '14 00:11

Will H


1 Answers

Having tinkered around some more I have found this solution, using coerce = True to force the column datatype conversion, and then identifying null values which I can cross reference back to the original file. But if there is a better way to do this (eg fixing the unrecognized time stamps in place) please let me know.

df1['DateTime']=pd.to_datetime(df1['Time_Date'], errors='coerce')
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]
like image 120
Will H Avatar answered Nov 14 '22 23:11

Will H