Alternatives to looping in Pandas when you need to update a column based on another

Question

I have a Pandas dataframe with text dates that'd I like to convert to datetime. The problem is some of my text dates are bad data and thus can't be converted. In cases for which a date can't be converted, I want to update an Error column to a value of True as well as set the Date column to None so that it can later be added to a database column that is formatted as a datetime.

This is a simplified example. My dataframe may have 1 million rows and multiple date columns this needs to be done for, so I need a faster way of doing this. I know the typical convention is to avoid looping with Pandas, but I can't figure out a way around it.

import pandas as pd
import numpy as np
import datetime

data = 1000 *[['010115', None],
        ['320115', None]]


df = pd.DataFrame(data=data,
                  columns=['Date', 'Error'])

for index, row in df.iterrows():
    try:
        datetime.datetime.strptime(row['Date'], '%d%m%y')
    except ValueError:
        row['Date'] = None
        row['Error'] = True
    except TypeError:
        pass

print df

jezrael · Accepted Answer

You can use to_datetime with parameter errors='coerce' and isnull:

data = 10 *[['010115', None],
        ['320115', None]]


df = pd.DataFrame(data=data,
                  columns=['Date', 'Error'])

print (df)
      Date Error
0   010115  None
1   320115  None
2   010115  None
3   320115  None
4   010115  None
5   320115  None
6   010115  None
7   320115  None
8   010115  None
9   320115  None
10  010115  None
11  320115  None
12  010115  None
13  320115  None
14  010115  None
15  320115  None
16  010115  None
17  320115  None
18  010115  None
19  320115  None

df['Date'] = pd.to_datetime(df['Date'], format='%d%m%y',errors='coerce') 
df['Error'] = df['Date'].isnull()
print (df)
         Date  Error
0  2015-01-01  False
1         NaT   True
2  2015-01-01  False
3         NaT   True
4  2015-01-01  False
5         NaT   True
6  2015-01-01  False
7         NaT   True
8  2015-01-01  False
9         NaT   True
10 2015-01-01  False
11        NaT   True
12 2015-01-01  False
13        NaT   True
14 2015-01-01  False
15        NaT   True
16 2015-01-01  False
17        NaT   True
18 2015-01-01  False
19        NaT   True

Alternatives to looping in Pandas when you need to update a column based on another

Tags:

python

datetime

pandas

dataframe

python-2.7

user2242044

1 Answers

jezrael

Recent Activity

Donate For Us

Alternatives to looping in Pandas when you need to update a column based on another

Tags:

python

datetime

pandas

dataframe

python-2.7

user2242044

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us