Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace date with NaT in Pandas dataframe

I have got a dataframe with a column of datetime64 type. In this column there are several rows with dates as 1999-09-09 23:59:59 where as they should have actually been represented as missing dates NaT. Somebody just decided to use this particular date to represent the missing data. Now I want these dates to be replaced as NaT (the missing date type for Pandas).

Also if I perform operation on this column with NaTs, like

df['date'] - df['column with missing date']

Does Pandas ignore the missing dates and maintain NaT for those rows or will it throw an error some thing like Null pointer exception in Java.

like image 636
user3527975 Avatar asked Jul 17 '14 12:07

user3527975


2 Answers

In [6]:
import pandas as pd
df = pd.DataFrame({'date':[pd.datetime(1999,9,9,23,59,59), pd.datetime(2014,1,1)]* 10})
df
Out[6]:
                  date
0  1999-09-09 23:59:59
1  2014-01-01 00:00:00
2  1999-09-09 23:59:59
3  2014-01-01 00:00:00
4  1999-09-09 23:59:59
5  2014-01-01 00:00:00
6  1999-09-09 23:59:59
7  2014-01-01 00:00:00
8  1999-09-09 23:59:59
9  2014-01-01 00:00:00
10 1999-09-09 23:59:59
11 2014-01-01 00:00:00
12 1999-09-09 23:59:59
13 2014-01-01 00:00:00
14 1999-09-09 23:59:59
15 2014-01-01 00:00:00
16 1999-09-09 23:59:59
17 2014-01-01 00:00:00
18 1999-09-09 23:59:59
19 2014-01-01 00:00:00
In [9]:

import numpy as np
df.loc[df['date'] == '1999-09-09 23:59:59 ', 'date'] = pd.NaT
df
Out[9]:
         date
0         NaT
1  2014-01-01
2         NaT
3  2014-01-01
4         NaT
5  2014-01-01
6         NaT
7  2014-01-01
8         NaT
9  2014-01-01
10        NaT
11 2014-01-01
12        NaT
13 2014-01-01
14        NaT
15 2014-01-01
16        NaT
17 2014-01-01
18        NaT
19 2014-01-01

To answer your second question most pandas functions handle NaN's appropriately, you can always just drop them:

In [10]:

df.dropna()
Out[10]:
         date
1  2014-01-01
3  2014-01-01
5  2014-01-01
7  2014-01-01
9  2014-01-01
11 2014-01-01
13 2014-01-01
15 2014-01-01
17 2014-01-01
19 2014-01-01

and perform the operation just on these rows

like image 200
EdChum Avatar answered Sep 17 '22 12:09

EdChum


There are some operations, especially between columns, that do not disconsider NaNs or NaTs. That is why you are getting NaTs as a result. If you want to disconsider the 1999-09-09 23:59:59 and also have a subtractable column, try to convert to NaTs and then swap the NaTs with zeros (.fillna(0)), so that, when subtracted, it will keep the value from the other column.

like image 45
milcent Avatar answered Sep 19 '22 12:09

milcent