Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate by grouping by each column one a time

I have an example dataframe as noted below. I am trying to calculate data for each column by grouping them together by column 'Sample_ID'. That is I would calculate the mean and standard deviation for the first column by each 'Sample_ID' group (1, 2 and 3). I can do this for one or even a few columns. For my new data, I have 100 columns.

df = pd.DataFrame([[1, 1.0, 2.3,0.2,0.53], [2, 3.35, 2.0,0.2,0.65], [2,3.4, 
           2.0,0.25,0.55], [3,3.4,2.0,0.25,0.55], [1,3.4,2.0,0.25,0.55], 
           [3,3.4,2.0,0.25,0.55]], 
           columns=["Sample_ID", "NaX", "NaU","OC","EC"])\
           .set_index('Sample_ID')

Is there a way to loop through each column and save them? Here is the example calculation for one column of data, I need to do this calculation for 100 columns of data.

Thanks for reading this!

OC_UNC=100*np.sqrt((((df.groupby(['Sample_ID'])['OC'].std()
         /df.groupby(['Sample_ID'])['OC'].mean())**2).sum()
           )/len((df.groupby(['Sample_ID'])['OC'].count())))
like image 731
Suresh Raja Avatar asked Jan 03 '23 16:01

Suresh Raja


2 Answers

IIUC:

In [31]: df.groupby('Sample_ID').agg('std')
Out[31]:
                NaX       NaU        OC        EC
Sample_ID
1          1.697056  0.212132  0.035355  0.014142
2          0.035355  0.000000  0.035355  0.070711
3          0.000000  0.000000  0.000000  0.000000

calculating both: mean and std:

In [32]: df.groupby('Sample_ID').agg(['mean','std'])
Out[32]:
             NaX             NaU               OC              EC
            mean       std  mean       std   mean       std  mean       std
Sample_ID
1          2.200  1.697056  2.15  0.212132  0.225  0.035355  0.54  0.014142
2          3.375  0.035355  2.00  0.000000  0.225  0.035355  0.60  0.070711
3          3.400  0.000000  2.00  0.000000  0.250  0.000000  0.55  0.000000
like image 57
MaxU - stop WAR against UA Avatar answered Jan 06 '23 04:01

MaxU - stop WAR against UA


Way more than you asked for

df.groupby('Sample_ID').describe()

            NaX                                                      NaU        ...       OC          EC                                                
          count   mean       std   min     25%    50%     75%  max count  mean  ...      75%   max count  mean       std   min    25%   50%    75%   max
Sample_ID                                                                       ...                                                                     
1           2.0  2.200  1.697056  1.00  1.6000  2.200  2.8000  3.4   2.0  2.15  ...   0.2375  0.25   2.0  0.54  0.014142  0.53  0.535  0.54  0.545  0.55
2           2.0  3.375  0.035355  3.35  3.3625  3.375  3.3875  3.4   2.0  2.00  ...   0.2375  0.25   2.0  0.60  0.070711  0.55  0.575  0.60  0.625  0.65
3           2.0  3.400  0.000000  3.40  3.4000  3.400  3.4000  3.4   2.0  2.00  ...   0.2500  0.25   2.0  0.55  0.000000  0.55  0.550  0.55  0.550  0.55
like image 22
piRSquared Avatar answered Jan 06 '23 05:01

piRSquared