Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtracting group specific values from each group

I have a dataframe in pandas containing information that I would like sort into groups. From each group, I want to then subtract out the first value for a certain column from the entire column in that group. The values should then be added to the dataframe as an additional column. An example of my initial dataframe:

              time    sample   x     y     mass 

              3       1.0     216    12    12
              4       1.0     218    13    12
              5       1.0     217    12    12
              6       1.0     234    13    13
              1       2.0     361    289   23
              2       2.0     362    287   22
              3       2.0     362    286   22
              5       3.0     124    56    18
              6       3.0     126    52    17

What I would like to have as a result:

       sample   time      x     y     mass   xdiff

       1.0       3       216    12    12     0
       1.0       4       218    13    12     2
       1.0       5       217    12    12     1
       1.0       6       214    13    13     -2
       2.0       1       361    289   23     0
       2.0       2       362    287   22     1
       2.0       3       362    286   22     1
       3.0       5       124    56    18     0
       3.0       6       126    52    17     2

So far I can only figure out pieces:

              s = df.groupby('sample')
              #gives me the groups
              s["x"].nth(0)
              #gets the first x value of each group

I'm just not sure how to subtract out the first x value for each sample group, from all the x values in that sample group. Does anyone know how this can be done? Thanks!

like image 753
UserR6 Avatar asked Feb 17 '17 11:02

UserR6


1 Answers

You can substract column by new Series created by use transform with first:

print (df.groupby('sample')['x'].transform('first'))
0    216
1    216
2    216
3    216
4    361
5    361
6    361
7    124
8    124
Name: x, dtype: int64


df['xdiff'] =  df['x'] - df.groupby('sample')['x'].transform('first') 
print (df)
   time  sample    x    y  mass  xdiff
0     3     1.0  216   12    12      0
1     4     1.0  218   13    12      2
2     5     1.0  217   12    12      1
3     6     1.0  234   13    13     18
4     1     2.0  361  289    23      0
5     2     2.0  362  287    22      1
6     3     2.0  362  286    22      1
7     5     3.0  124   56    18      0
8     6     3.0  126   52    17      2

df['xdiff'] =  df['x'].sub( df.groupby('sample')['x'].transform('first'))
print (df)
   time  sample    x    y  mass  xdiff
0     3     1.0  216   12    12      0
1     4     1.0  218   13    12      2
2     5     1.0  217   12    12      1
3     6     1.0  234   13    13     18
4     1     2.0  361  289    23      0
5     2     2.0  362  287    22      1
6     3     2.0  362  286    22      1
7     5     3.0  124   56    18      0
8     6     3.0  126   52    17      2

And solution with apply:

df['xdiff'] =  df.groupby('sample')['x'].apply(lambda x: x - x.iloc[0])
print (df)
   time  sample    x    y  mass  xdiff
0     3     1.0  216   12    12      0
1     4     1.0  218   13    12      2
2     5     1.0  217   12    12      1
3     6     1.0  234   13    13     18
4     1     2.0  361  289    23      0
5     2     2.0  362  287    22      1
6     3     2.0  362  286    22      1
7     5     3.0  124   56    18      0
8     6     3.0  126   52    17      2
like image 65
jezrael Avatar answered Nov 01 '22 22:11

jezrael