Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtracting Two Columns with a Groupby in Pandas

I have a dataframe and would like to subtract two columns of the previous row, provided that the previous row has the same Name value. If it does not, then I would like it yield NAN and fill with -. My groupby expression yields the error, TypeError: 'Series' objects are mutable, thus they cannot be hashed, which is very ambiguous. What am I missing?

import pandas as pd
df = pd.DataFrame(data=[['Person A', 5, 8], ['Person A', 13, 11], ['Person B', 11, 32], ['Person B', 15, 20]], columns=['Names', 'Value', 'Value1'])
df['diff'] = df.groupby('Names').apply(df['Value'].shift(1) - df['Value1'].shift(1)).fillna('-')
print df

Desired Output:

      Names  Value  Value1  diff
0  Person A      5       8     -
1  Person A     13      11    -3
2  Person B     11      32     -
3  Person B     15      20   -21
like image 426
user2242044 Avatar asked May 31 '16 18:05

user2242044


People also ask

How do I subtract two columns in pandas?

We can create a function specifically for subtracting the columns, by taking column data as arguments and then using the apply method to apply it to all the data points throughout the column.

Can you Groupby two columns pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

How do you subtract two data sets in pandas?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.


2 Answers

You can add lambda x and change df['Value'] to x['Value'], similar with Value1 and last reset_index:

df['diff'] = df.groupby('Names')
               .apply(lambda x: x['Value'].shift(1) - x['Value1'].shift(1))
               .fillna('-')
               .reset_index(drop=True)
print (df)
      Names  Value  Value1 diff
0  Person A      5       8    -
1  Person A     13      11   -3
2  Person B     11      32    -
3  Person B     15      20  -21

Another solution with DataFrameGroupBy.shift:

df1 = df.groupby('Names')['Value','Value1'].shift()
print (df1)
   Value  Value1
0    NaN     NaN
1    5.0     8.0
2    NaN     NaN
3   11.0    32.0
df['diff'] = (df1.Value - df1.Value1).fillna('-')

print (df)
      Names  Value  Value1 diff
0  Person A      5       8    -
1  Person A     13      11   -3
2  Person B     11      32    -
3  Person B     15      20  -21
like image 132
jezrael Avatar answered Oct 21 '22 08:10

jezrael


you can also do it this way:

In [76]: df['diff'] = (-df.groupby('Names')[['Value1','Value']].shift(1).diff(axis=1)['Value1']).fillna(0)

In [77]: df
Out[77]:
      Names  Value  Value1  diff
0  Person A      5       8   0.0
1  Person A     13      11  -3.0
2  Person B     11      32   0.0
3  Person B     15      20 -21.0
like image 42
MaxU - stop WAR against UA Avatar answered Oct 21 '22 09:10

MaxU - stop WAR against UA