I have a Pandas dataframe with text dates that'd I like to convert to datetime. The problem is some of my text dates are bad data and thus can't be converted. In cases for which a date can't be converted, I want to update an Error column to a value of True as well as set the Date column to None so that it can later be added to a database column that is formatted as a datetime.
This is a simplified example. My dataframe may have 1 million rows and multiple date columns this needs to be done for, so I need a faster way of doing this. I know the typical convention is to avoid looping with Pandas, but I can't figure out a way around it.
import pandas as pd
import numpy as np
import datetime
data = 1000 *[['010115', None],
['320115', None]]
df = pd.DataFrame(data=data,
columns=['Date', 'Error'])
for index, row in df.iterrows():
try:
datetime.datetime.strptime(row['Date'], '%d%m%y')
except ValueError:
row['Date'] = None
row['Error'] = True
except TypeError:
pass
print df
You can use to_datetime with parameter errors='coerce' and isnull:
data = 10 *[['010115', None],
['320115', None]]
df = pd.DataFrame(data=data,
columns=['Date', 'Error'])
print (df)
Date Error
0 010115 None
1 320115 None
2 010115 None
3 320115 None
4 010115 None
5 320115 None
6 010115 None
7 320115 None
8 010115 None
9 320115 None
10 010115 None
11 320115 None
12 010115 None
13 320115 None
14 010115 None
15 320115 None
16 010115 None
17 320115 None
18 010115 None
19 320115 None
df['Date'] = pd.to_datetime(df['Date'], format='%d%m%y',errors='coerce')
df['Error'] = df['Date'].isnull()
print (df)
Date Error
0 2015-01-01 False
1 NaT True
2 2015-01-01 False
3 NaT True
4 2015-01-01 False
5 NaT True
6 2015-01-01 False
7 NaT True
8 2015-01-01 False
9 NaT True
10 2015-01-01 False
11 NaT True
12 2015-01-01 False
13 NaT True
14 2015-01-01 False
15 NaT True
16 2015-01-01 False
17 NaT True
18 2015-01-01 False
19 NaT True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With