Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NaT error when adding column in Pandas dataframe

I am trying to create a dataframe of dates in python. I am using dates as the index :

aDates.head(5)
Out[114]: 
0   2009-12-31
1   2010-01-01
2   2010-01-04
3   2010-01-05
4   2010-01-06
Name: Date, dtype: datetime64[ns]

I then create an empty dataframe:

dfAll_dates = pd.DataFrame(index = aDates)

I got then a function that creates a pandas Series of dates that I am trying to add as a column, but so that you can reproduce easily, let's assume we add the same serie that we used for the index :

dfAll_dates['my_added_column'] = aDates

But this results in :

dfAll_dates.head(5)

Out[120]: 
           my_added_column
Date                      
2009-12-31             NaT
2010-01-01             NaT
2010-01-04             NaT
2010-01-05             NaT
2010-01-06             NaT

I tried to convert my dates to timestamp using .totimestamp on aDates, but this did not solve the problem (I then have a "bound method Series.to_timestamp of 0") , and as there is no types in the definition I do not see why I would have to convert anyway.

Could you please help on this ?

like image 985
Djiggy Avatar asked Jan 29 '23 21:01

Djiggy


2 Answers

There is problem there are different indexes in Series and DataFrame, so data no align and get NaNs:

One possible solution is convert values of aDates to numpy array by values:

dfAll_dates = pd.DataFrame(index = aDates)
dfAll_dates['my_added_column'] = aDates.values
print (dfAll_dates)
           my_added_column
Date                      
2009-12-31      2009-12-31
2010-01-01      2010-01-01
2010-01-04      2010-01-04
2010-01-05      2010-01-05
2010-01-06      2010-01-06

Or use to_frame + set_index, also is necessary rename column:

d = {'Date':'my_added_column'}
df = aDates.to_frame().set_index('Date', drop=False).rename(columns=d)
print (df)
           my_added_column
Date                      
2009-12-31      2009-12-31
2010-01-01      2010-01-01
2010-01-04      2010-01-04
2010-01-05      2010-01-05
2010-01-06      2010-01-06

Or use DataFrame constructor with dict for new column:

dfAll_dates = pd.DataFrame({'my_added_column':aDates.values}, index = aDates)
print (dfAll_dates)
           my_added_column
Date                      
2009-12-31      2009-12-31
2010-01-01      2010-01-01
2010-01-04      2010-01-04
2010-01-05      2010-01-05
2010-01-06      2010-01-06
like image 108
jezrael Avatar answered Feb 01 '23 12:02

jezrael


Another approach is to use the pd.Index.to_series method that creates a series where the values take on what is in the index and the index remains the same.

dfAll_dates['my_added_column'] = dfAll_dates.index.to_series()

That takes care of the index alignment. However, you didn't even need to do that. As @jezrael showed, if we eliminate passing a series object and only pass an array, pandas won't attempt to align an index that isn't there. We can accomplish the same thing by referring directly to the index

dfAll_dates['my_added_column'] = dfAll_dates.index

In either case

dfAll_dates

           my_added_column
2009-12-31      2009-12-31
2010-01-01      2010-01-01
2010-01-04      2010-01-04
2010-01-05      2010-01-05
2010-01-06      2010-01-06

In both of these scenarios, we are no longer required to track aDates and only need to refer to objects already present in dfAll_dates.

like image 22
piRSquared Avatar answered Feb 01 '23 10:02

piRSquared