Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group by within group by in pandas

Consider the following dataset:

min    5-min     a
0       0        800
0       0        801
1       0        802
1       0        803
1       0        804
2       0        805
2       0        805
2       0        810
3       0        801
3       0        802
3       0        803
4       0        804
4       0        805
5       1        806
5       1        800
5       1        890
6       1        890
6       1        880
6       1        800
7       1        804
7       1        806
8       1        801
9       1        800
9       1        900
10      1        770
10      1        803
10      1        811

I need to calculate std of a on each group based on the minute and then calculate the mean of the results values in each group of 5 min. I do not know how to find the border of 5 min, after calculation of std. How should I save the data to know which std belong to each group of 5 min?

data.groupby('minute').a.std()

I would appreciate of any help.

like image 538
Elham Avatar asked Nov 20 '25 05:11

Elham


2 Answers

Taskos answer is great but I wasn't sure if you needed the data to be pushed back into the dataframe or not. Assuming what you want is to add the new columns in the parent after each groupby operation, Ive opted to do that for you as follows

import pandas as pd

df = your_df

# First we create the standard deviation column

def add_std(grp):
    grp['stdevs'] = grp['a'].std()
    return grp

df = df.groupby('min').apply(add_std)

# Next we create the 5 minute mean column

def add_meandev(grp):
    grp['meandev'] = grp['stdevs'].mean()
    return grp

print(df.groupby('5-min').apply(add_meandev))

This can be done more elegantly by chaining etc but I have opted to lay it out like this so that the underlying process is more visible to you.

The final output from this will look like the following:

    min  5-min    a     stdevs    meandev
0     0      0  800   0.707107   1.345283
1     0      0  801   0.707107   1.345283
2     1      0  802   1.000000   1.345283
3     1      0  803   1.000000   1.345283
4     1      0  804   1.000000   1.345283
5     2      0  805   2.886751   1.345283
6     2      0  805   2.886751   1.345283
7     2      0  810   2.886751   1.345283
8     3      0  801   1.000000   1.345283
9     3      0  802   1.000000   1.345283
10    3      0  803   1.000000   1.345283
11    4      0  804   0.707107   1.345283
12    4      0  805   0.707107   1.345283
13    5      1  806  50.318983  39.107147
14    5      1  800  50.318983  39.107147
15    5      1  890  50.318983  39.107147
16    6      1  890  49.328829  39.107147
17    6      1  880  49.328829  39.107147
18    6      1  800  49.328829  39.107147
19    7      1  804   1.414214  39.107147
20    7      1  806   1.414214  39.107147
21    8      1  801        NaN  39.107147
22    9      1  800  70.710678  39.107147
23    9      1  900  70.710678  39.107147
24   10      1  770  21.733231  39.107147
25   10      1  803  21.733231  39.107147
26   10      1  811  21.733231  39.107147
like image 80
Paula Livingstone Avatar answered Nov 22 '25 17:11

Paula Livingstone


Not 100% clear on what you are asking... but I think this is what you need:

data.groupby(['min','5-min']).std().groupby('5-min').mean()

This finds the standard deviation based on the 5-min column of the means calculated based on the 'min' column.

like image 31
Tasko Olevski Avatar answered Nov 22 '25 18:11

Tasko Olevski



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!