I'm doing a bit more complex operation on a dataframe where I compare two rows which can be anywhere in the frame.
Here's an example:
import pandas as pd
import numpy as np
D = {'A':['a','a','c','e','e','b','b'],'B':['c','f','a','b','d','a','e']\
,'AW':[1,2,3,4,5,6,7],'BW':[10,20,30,40,50,60,70]}
P = pd.DataFrame(D)
P = P.sort_values(['A','B'])
P['AB'] = P.A+'_'+P.B
P['AWBW'] = P.AW+P.BW
Now what I am doing here is that I have pairings of strings in A
and B
, for example a_c
which I call AB
. And I have the reverse pairing c_a
as well. I sum over the numbers AW
and BW
for each pairing, called AWBW
.
Now I want to subtract the summed value of a_c
from the value of c_a
and do the same thing for every string pairing where both variants exist. All other values should just be NaN
, so my result should look like this:
A AW B BW AB AWBW RowDelta
0 a 1 c 10 a_c 11 -22.0
1 a 2 f 20 a_f 22 NaN
5 b 6 a 60 b_a 66 NaN
6 b 7 e 70 b_e 77 33.0
2 c 3 a 30 c_a 33 22.0
3 e 4 b 40 e_b 44 -33.0
4 e 5 d 50 e_d 55 NaN
I have almost solved the way to do this, but there's one problem left I'm stuck at.
Here's my solution so far:
for i,row in P.iterrows():
P.ix[i,'RowDelta'] = row['AWBW']\
- P[(P['A'] == row.AB[2]) & (P['B'] == row.AB[0])]['AWBW'].get(0,np.nan)
The problem is that P[(P['A'] == row.AB[2]) & (P['B'] == row.AB[0])]['AWBW']
returns a series which is either empty or has exactly one element whose index however is variable.
Now the series.get
method solves the problem of returning NaN
when the series is empty but it wants a definitive index value, in this case I use 0
, but I can not get a dynamic index there.
I can not do this for example
T = P[(P['A'] == row.AB[2]) & (P['B'] == row.AB[0])]['AWBW']
T.get(T.index[0],np.nan)
because there is no index if the series is empty and this leads to an error when doing T.index[0]
. Same goes for my attempts using iloc
.
Is there a way to dynamically get the unknown one index of a series if it has one element (and never more than one) while at the same time handling the case of an empty series?
Credit goes to piRSquared for pointing me into the right direction for the solution:
AB = P.AB.str.split('_', expand=True)
AB = AB.merge(AB, left_on=[0, 1], right_on=[1, 0],how='inner')[[0,1]]
AB = AB.merge(P,left_on=[0,1], right_on=['A','B'])[['A','AW','B','BW']]
AB = AB.merge(P,left_on=['A','B'], right_on=['B','A'])[['AW_x','BW_x','AW_y','BW_y','AB']]
AB['RowDelta'] = AB.AW_y+AB.BW_y-AB.AW_x-AB.BW_x
P = P.merge(AB[['AB','RowDelta']],on='AB',how='outer')
Maybe it can be made shorter or nicer, it works for sure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With