Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Get the only value of a series or nan if it does not exist

Tags:

python

pandas

I'm doing a bit more complex operation on a dataframe where I compare two rows which can be anywhere in the frame.

Here's an example:

import pandas as pd
import numpy as np

D = {'A':['a','a','c','e','e','b','b'],'B':['c','f','a','b','d','a','e']\
 ,'AW':[1,2,3,4,5,6,7],'BW':[10,20,30,40,50,60,70]}
P = pd.DataFrame(D)
P = P.sort_values(['A','B'])
P['AB'] = P.A+'_'+P.B
P['AWBW'] = P.AW+P.BW

Now what I am doing here is that I have pairings of strings in A and B, for example a_c which I call AB. And I have the reverse pairing c_a as well. I sum over the numbers AW and BW for each pairing, called AWBW.

Now I want to subtract the summed value of a_c from the value of c_a and do the same thing for every string pairing where both variants exist. All other values should just be NaN, so my result should look like this:

   A  AW  B  BW   AB  AWBW  RowDelta
0  a   1  c  10  a_c    11     -22.0
1  a   2  f  20  a_f    22       NaN
5  b   6  a  60  b_a    66       NaN
6  b   7  e  70  b_e    77      33.0
2  c   3  a  30  c_a    33      22.0
3  e   4  b  40  e_b    44     -33.0
4  e   5  d  50  e_d    55       NaN

I have almost solved the way to do this, but there's one problem left I'm stuck at.

Here's my solution so far:

for i,row in P.iterrows():
  P.ix[i,'RowDelta'] = row['AWBW']\
   - P[(P['A'] == row.AB[2]) & (P['B'] == row.AB[0])]['AWBW'].get(0,np.nan)

The problem is that P[(P['A'] == row.AB[2]) & (P['B'] == row.AB[0])]['AWBW'] returns a series which is either empty or has exactly one element whose index however is variable.

Now the series.get method solves the problem of returning NaN when the series is empty but it wants a definitive index value, in this case I use 0, but I can not get a dynamic index there.

I can not do this for example

T = P[(P['A'] == row.AB[2]) & (P['B'] == row.AB[0])]['AWBW']
T.get(T.index[0],np.nan)

because there is no index if the series is empty and this leads to an error when doing T.index[0]. Same goes for my attempts using iloc.

Is there a way to dynamically get the unknown one index of a series if it has one element (and never more than one) while at the same time handling the case of an empty series?

like image 739
Khris Avatar asked Oct 29 '22 21:10

Khris


1 Answers

Credit goes to piRSquared for pointing me into the right direction for the solution:

AB = P.AB.str.split('_', expand=True)
AB = AB.merge(AB, left_on=[0, 1], right_on=[1, 0],how='inner')[[0,1]]
AB = AB.merge(P,left_on=[0,1], right_on=['A','B'])[['A','AW','B','BW']]
AB = AB.merge(P,left_on=['A','B'], right_on=['B','A'])[['AW_x','BW_x','AW_y','BW_y','AB']]
AB['RowDelta'] = AB.AW_y+AB.BW_y-AB.AW_x-AB.BW_x
P = P.merge(AB[['AB','RowDelta']],on='AB',how='outer')

Maybe it can be made shorter or nicer, it works for sure.

like image 55
Khris Avatar answered Nov 15 '22 05:11

Khris