pretty basic question, but was wondering:
What is the 'proper' way to average every 2 rows together in pandas Dataframe, and thus end up with only half the number of rows?
Note that this is different than the rolling_mean since it reduces the number of entries.
A fast way to do it:
>>> s = pd.Series(range(10))
>>> s
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
>>> ((s + s.shift(-1)) / 2)[::2]
0 0.5
2 2.5
4 4.5
6 6.5
8 8.5
The "proper way" I guess would be something like:
>> a = s.index.values
>>> idx = np.array([a, a]).T.flatten()[:len(a)]
>>> idx
[0 0 1 1 2 2 3 3 4 4]
>>> s.groupby(idx).mean()
0 0.5
2 2.5
4 4.5
6 6.5
8 8.5
But it is ~2x slower and gets worse with increasing size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With