I am trying to create a dataframe of dates in python. I am using dates as the index :
aDates.head(5)
Out[114]:
0 2009-12-31
1 2010-01-01
2 2010-01-04
3 2010-01-05
4 2010-01-06
Name: Date, dtype: datetime64[ns]
I then create an empty dataframe:
dfAll_dates = pd.DataFrame(index = aDates)
I got then a function that creates a pandas Series of dates that I am trying to add as a column, but so that you can reproduce easily, let's assume we add the same serie that we used for the index :
dfAll_dates['my_added_column'] = aDates
But this results in :
dfAll_dates.head(5)
Out[120]:
my_added_column
Date
2009-12-31 NaT
2010-01-01 NaT
2010-01-04 NaT
2010-01-05 NaT
2010-01-06 NaT
I tried to convert my dates to timestamp using .totimestamp on aDates, but this did not solve the problem (I then have a "bound method Series.to_timestamp of 0") , and as there is no types in the definition I do not see why I would have to convert anyway.
Could you please help on this ?
There is problem there are different indexes
in Series
and DataFrame
, so data no align and get NaN
s:
One possible solution is convert values of aDates
to numpy array
by values
:
dfAll_dates = pd.DataFrame(index = aDates)
dfAll_dates['my_added_column'] = aDates.values
print (dfAll_dates)
my_added_column
Date
2009-12-31 2009-12-31
2010-01-01 2010-01-01
2010-01-04 2010-01-04
2010-01-05 2010-01-05
2010-01-06 2010-01-06
Or use to_frame
+ set_index
, also is necessary rename column:
d = {'Date':'my_added_column'}
df = aDates.to_frame().set_index('Date', drop=False).rename(columns=d)
print (df)
my_added_column
Date
2009-12-31 2009-12-31
2010-01-01 2010-01-01
2010-01-04 2010-01-04
2010-01-05 2010-01-05
2010-01-06 2010-01-06
Or use DataFrame
constructor with dict
for new column:
dfAll_dates = pd.DataFrame({'my_added_column':aDates.values}, index = aDates)
print (dfAll_dates)
my_added_column
Date
2009-12-31 2009-12-31
2010-01-01 2010-01-01
2010-01-04 2010-01-04
2010-01-05 2010-01-05
2010-01-06 2010-01-06
Another approach is to use the pd.Index.to_series
method that creates a series where the values take on what is in the index and the index remains the same.
dfAll_dates['my_added_column'] = dfAll_dates.index.to_series()
That takes care of the index alignment. However, you didn't even need to do that. As @jezrael showed, if we eliminate passing a series object and only pass an array, pandas
won't attempt to align an index that isn't there. We can accomplish the same thing by referring directly to the index
dfAll_dates['my_added_column'] = dfAll_dates.index
In either case
dfAll_dates
my_added_column
2009-12-31 2009-12-31
2010-01-01 2010-01-01
2010-01-04 2010-01-04
2010-01-05 2010-01-05
2010-01-06 2010-01-06
In both of these scenarios, we are no longer required to track aDates
and only need to refer to objects already present in dfAll_dates
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With