If I use the following methodology to construct a pandas.DataFrame, I get an output that (I think) is peculiar:
import pandas, numpy
df = pandas.DataFrame(
numpy.random.rand(100,2), index = numpy.arange(100), columns = ['s1','s2'])
smoothed = pandas.DataFrame(
pandas.ewma(df, span = 21), index = df.index, columns = ['smooth1','smooth2'])
When I go to look at the smoothed values, I get:
>>> smoothed.tail()
smooth1 smooth2
95 NaN NaN
96 NaN NaN
97 NaN NaN
98 NaN NaN
99 NaN NaN
This seems like it an aggregation of the following fragmented calls, which yield different results:
smoothed2 = pandas.DataFrame(pandas.ewma(df, span = 21))
smoothed2.index = df.index
smoothed2.columns = ['smooth1','smooth2']
Again using the DataFrame.tail() invocation I get:
>>> smoothed2.tail()
smooth1 smooth2
95 0.496021 0.501153
96 0.506118 0.507541
97 0.516655 0.544621
98 0.520212 0.543751
99 0.518170 0.572429
Can anyone provide rationale as to why these to DataFrame construction methodologies should be different?
The result of ewma(df, span=21) is already a DataFrame, so when you pass it to the DataFrame constructor along with a list of columns, it "selects" out the columns that you passed. It's difficult in this particular case to break the link between label and data. If you had done instead:
In [23]: smoothed = DataFrame(ewma(df, span = 21).values, index=df.index, columns = ['smooth1','smooth2'])
In [24]: smoothed.head()
Out[24]:
smooth1 smooth2
0 0.218350 0.877693
1 0.400214 0.813499
2 0.308564 0.739426
3 0.433341 0.641891
4 0.525260 0.620541
that is no problem. of course
smoothed = ewma(df, span=21)
smoothed.columns = ['smooth1', 'smooth2']
is perfectly fine too
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With