Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Excel-like SUMIFS in Pandas

Tags:

python

pandas

I recently learned about pandas and was happy to see its analytics functionality. I am trying to convert Excel array functions into the Pandas equivalent to automate spreadsheets that I have created for the creation of performance attribution reports. In this example, I created a new column in Excel based on conditions within other columns:

={SUMIFS($F$10:$F$4518,$A$10:$A$4518,$C$4,$B$10:$B$4518,0,$C$10:$C$4518," ",$D$10:$D$4518,$D10,$E$10:$E$4518,$E10)}

The formula is summing up the values in the "F" array (security weights) based on certain conditions. "A" array (portfolio ID) is a certain number, "B" array (security id) is zero, "C" array (group description) is " ", "D" array (start date) is the date of the row that I am on, and "E" array (end date) is the date of the row that I am on.

In Pandas, I am using the DataFrame. Creating a new column on a dataframe with the first three conditions is straight forward, but I am having difficult with the last two conditions.

reportAggregateDF['PORT_WEIGHT'] = reportAggregateDF['SEC_WEIGHT_RATE']
          [(reportAggregateDF['PORT_ID'] == portID) &
           (reportAggregateDF['SEC_ID'] == 0) &
           (reportAggregateDF['GROUP_LIST'] == " ") & 
           (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[:]) & 
           (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[:])].sum()

Obviously the .ix[:] in the last two conditions is not doing anything for me, but is there a way to make the sum conditional on the row that I am on without looping? My goal is to not do any loops, but instead use purely vector operations.

like image 268
Julio Guzman Avatar asked Jun 13 '12 10:06

Julio Guzman


People also ask

What can I use instead of Sumifs?

With SUMPRODUCT In spirit, the SUMPRODUCT option is closest to the SUMIFS formula since we are summing values based on multiple criteria. As before, it works fine as long as there is only one matching result.

Why is Sumifs better than SUMIF?

Difference Between Excel SUMIF and Excel SUMIFS Functions Both are logical SUM functions that find the SUM of a given range based on specified conditions. The only difference between Excel SUMIFS & SUMIF functions is that SUMIFs can check for multiple criteria at once, while SUMIF can check for one criterion at a time.

Is Sumifs better than Vlookup?

SUMIFS is a challenger quickly gaining popularity with Excel users. SUMIFS can do much of what VLOOKUP can do … but better :-) That is, SUMIFS makes a great alternative to VLOOKUP, and here's why: 1. SUMIFS matches equivalent values when stored as different data types.


2 Answers

You want to use the apply function and a lambda:

>> df
     A    B    C    D     E
0  mitfx  0  200  300  0.25
1     gs  1  150  320  0.35
2    duk  1    5    2  0.45
3    bmo  1  145   65  0.65

Let's say I want to sum column C times E but only if column B == 1 and D is greater than 5:

df['matches'] = df.apply(lambda x: x['C'] * x['E'] if x['B'] == 1 and x['D'] > 5 else 0, axis=1)
df.matches.sum()

It might be cleaner to split this into two steps:

df_subset = df[(df.B == 1) & (df.D > 5)]
df_subset.apply(lambda x: x.C * x.E, axis=1).sum()

or to use simply multiplication for speed:

df_subset = df[(df.B == 1) & (df.D > 5)]
print sum(df_subset.C * df_subset.E)

You are absolutely right to want to do this problem without loops.

like image 59
guyrt Avatar answered Oct 27 '22 00:10

guyrt


I'm sure there is a better way, but this did it in a loop:

for idx, eachRecord in reportAggregateDF.T.iteritems():
reportAggregateDF['PORT_WEIGHT'].ix[idx] = reportAggregateDF['SEC_WEIGHT_RATE'][(reportAggregateDF['PORT_ID'] == portID) &            
    (reportAggregateDF['SEC_ID'] == 0) &            
    (reportAggregateDF['GROUP_LIST'] == " ") &             
    (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[idx]) &             
    (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[idx])].sum()
like image 26
Julio Guzman Avatar answered Oct 26 '22 23:10

Julio Guzman