Initializing pandas dataframes with and without index,columns yields different results

Question

If I use the following methodology to construct a pandas.DataFrame, I get an output that (I think) is peculiar:

import pandas, numpy

df = pandas.DataFrame(
    numpy.random.rand(100,2), index = numpy.arange(100), columns = ['s1','s2'])
smoothed = pandas.DataFrame(
    pandas.ewma(df, span = 21), index = df.index, columns = ['smooth1','smooth2'])

When I go to look at the smoothed values, I get:

>>> smoothed.tail()
smooth1  smooth2
95      NaN      NaN
96      NaN      NaN
97      NaN      NaN
98      NaN      NaN
99      NaN      NaN

This seems like it an aggregation of the following fragmented calls, which yield different results:

smoothed2 = pandas.DataFrame(pandas.ewma(df, span = 21))
smoothed2.index = df.index
smoothed2.columns = ['smooth1','smooth2']

Again using the DataFrame.tail() invocation I get:

>>> smoothed2.tail()
smooth1   smooth2
95  0.496021  0.501153 
96  0.506118  0.507541
97  0.516655  0.544621
98  0.520212  0.543751
99  0.518170  0.572429

Can anyone provide rationale as to why these to DataFrame construction methodologies should be different?

Wes McKinney · Accepted Answer

The result of ewma(df, span=21) is already a DataFrame, so when you pass it to the DataFrame constructor along with a list of columns, it "selects" out the columns that you passed. It's difficult in this particular case to break the link between label and data. If you had done instead:

In [23]: smoothed = DataFrame(ewma(df, span = 21).values, index=df.index, columns = ['smooth1','smooth2'])
In [24]: smoothed.head()
Out[24]: 
    smooth1   smooth2
0  0.218350  0.877693
1  0.400214  0.813499
2  0.308564  0.739426
3  0.433341  0.641891
4  0.525260  0.620541

that is no problem. of course

smoothed = ewma(df, span=21)
smoothed.columns = ['smooth1', 'smooth2']

is perfectly fine too

Initializing pandas dataframes with and without index,columns yields different results

Tags:

python

pandas

numpy

benjaminmgross

1 Answers

Wes McKinney

Recent Activity

Donate For Us

Initializing pandas dataframes with and without index,columns yields different results

Tags:

python

pandas

numpy

benjaminmgross

1 Answers

Wes McKinney

Related questions

Recent Activity

Donate For Us