I have a dataframe
and would like to subtract two columns of the previous row, provided that the previous row has the same Name
value. If it does not, then I would like it yield NAN
and fill with -
. My groupby
expression yields the error, TypeError: 'Series' objects are mutable, thus they cannot be hashed
, which is very ambiguous. What am I missing?
import pandas as pd
df = pd.DataFrame(data=[['Person A', 5, 8], ['Person A', 13, 11], ['Person B', 11, 32], ['Person B', 15, 20]], columns=['Names', 'Value', 'Value1'])
df['diff'] = df.groupby('Names').apply(df['Value'].shift(1) - df['Value1'].shift(1)).fillna('-')
print df
Desired Output:
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
We can create a function specifically for subtracting the columns, by taking column data as arguments and then using the apply method to apply it to all the data points throughout the column.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.
You can add lambda x
and change df['Value']
to x['Value']
, similar with Value1
and last reset_index
:
df['diff'] = df.groupby('Names')
.apply(lambda x: x['Value'].shift(1) - x['Value1'].shift(1))
.fillna('-')
.reset_index(drop=True)
print (df)
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
Another solution with DataFrameGroupBy.shift
:
df1 = df.groupby('Names')['Value','Value1'].shift()
print (df1)
Value Value1
0 NaN NaN
1 5.0 8.0
2 NaN NaN
3 11.0 32.0
df['diff'] = (df1.Value - df1.Value1).fillna('-')
print (df)
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
you can also do it this way:
In [76]: df['diff'] = (-df.groupby('Names')[['Value1','Value']].shift(1).diff(axis=1)['Value1']).fillna(0)
In [77]: df
Out[77]:
Names Value Value1 diff
0 Person A 5 8 0.0
1 Person A 13 11 -3.0
2 Person B 11 32 0.0
3 Person B 15 20 -21.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With