I am have imported a CSV file which has mixed data formats - some date formats recognized by read_csv, plus some Excel serial-datetime format (eg 41,866.321).
Once the data is imported, the column type is shown as object (given the different types of data) and the dates (both types of formats) have dtype string.
I would like to use the to_datetime method to convert the recognized string date formats into datetimes in the dataframe column, leaving the unrecognized strings in excel format which I can then isolate and correct off line. But unless I apply the method row by row (way too slow), it fails to do this.
Does anyone have a cleverer way of solving this?
Update: having tinkered around some more I have found this solution, using coerce = True to force the column datatype conversion, and then identifying null values which I can cross reference back to the original file. But if there is a better way to do this (eg fixing the unrecognized time stamps in place) please let me know.
df1['DateTime']=pd.to_datetime(df1['Time_Date'],coerce=True)
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]
Having tinkered around some more I have found this solution, using coerce = True to force the column datatype conversion, and then identifying null values which I can cross reference back to the original file. But if there is a better way to do this (eg fixing the unrecognized time stamps in place) please let me know.
df1['DateTime']=pd.to_datetime(df1['Time_Date'], errors='coerce')
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With