I recently learned about <code>pandas</code> and was happy to see its analytics functionality. I am trying to convert Excel array functions into the Pandas equivalent to automate spreadsheets that I have created for the creation of performance attribution reports. In this example, I created a new column in Excel based on conditions within other columns: <pre class="prettyprint"><code>={SUMIFS($F$10:$F$4518,$A$10:$A$4518,$C$4,$B$10:$B$4518,0,$C$10:$C$4518," ",$D$10:$D$4518,$D10,$E$10:$E$4518,$E10)} </code></pre> The formula is summing up the values in the "F" array (security weights) based on certain conditions. "A" array (portfolio ID) is a certain number, "B" array (security id) is zero, "C" array (group description) is " ", "D" array (start date) is the date of the row that I am on, and "E" array (end date) is the date of the row that I am on. In Pandas, I am using the DataFrame. Creating a new column on a dataframe with the first three conditions is straight forward, but I am having difficult with the last two conditions. <pre class="prettyprint"><code>reportAggregateDF['PORT_WEIGHT'] = reportAggregateDF['SEC_WEIGHT_RATE'] [(reportAggregateDF['PORT_ID'] == portID) & (reportAggregateDF['SEC_ID'] == 0) & (reportAggregateDF['GROUP_LIST'] == " ") & (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[:]) & (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[:])].sum() </code></pre> Obviously the .ix[:] in the last two conditions is not doing anything for me, but is there a way to make the sum conditional on the row that I am on without looping? My goal is to not do any loops, but instead use purely vector operations.

You want to use the apply function and a lambda: <pre class="prettyprint"><code>>> df A B C D E 0 mitfx 0 200 300 0.25 1 gs 1 150 320 0.35 2 duk 1 5 2 0.45 3 bmo 1 145 65 0.65 </code></pre> Let's say I want to sum column C times E but only if column B == 1 and D is greater than 5: <pre class="prettyprint"><code>df['matches'] = df.apply(lambda x: x['C'] * x['E'] if x['B'] == 1 and x['D'] > 5 else 0, axis=1) df.matches.sum() </code></pre> It might be cleaner to split this into two steps: <pre class="prettyprint"><code>df_subset = df[(df.B == 1) & (df.D > 5)] df_subset.apply(lambda x: x.C * x.E, axis=1).sum() </code></pre> or to use simply multiplication for speed: <pre class="prettyprint"><code>df_subset = df[(df.B == 1) & (df.D > 5)] print sum(df_subset.C * df_subset.E) </code></pre> You are absolutely right to want to do this problem without loops.

I'm sure there is a better way, but this did it in a loop: <pre class="prettyprint"><code>for idx, eachRecord in reportAggregateDF.T.iteritems(): reportAggregateDF['PORT_WEIGHT'].ix[idx] = reportAggregateDF['SEC_WEIGHT_RATE'][(reportAggregateDF['PORT_ID'] == portID) & (reportAggregateDF['SEC_ID'] == 0) & (reportAggregateDF['GROUP_LIST'] == " ") & (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[idx]) & (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[idx])].sum() </code></pre>

Create Excel-like SUMIFS in Pandas

Tags:

python

pandas

I recently learned about pandas and was happy to see its analytics functionality. I am trying to convert Excel array functions into the Pandas equivalent to automate spreadsheets that I have created for the creation of performance attribution reports. In this example, I created a new column in Excel based on conditions within other columns:

={SUMIFS($F$10:$F$4518,$A$10:$A$4518,$C$4,$B$10:$B$4518,0,$C$10:$C$4518," ",$D$10:$D$4518,$D10,$E$10:$E$4518,$E10)}

The formula is summing up the values in the "F" array (security weights) based on certain conditions. "A" array (portfolio ID) is a certain number, "B" array (security id) is zero, "C" array (group description) is " ", "D" array (start date) is the date of the row that I am on, and "E" array (end date) is the date of the row that I am on.

In Pandas, I am using the DataFrame. Creating a new column on a dataframe with the first three conditions is straight forward, but I am having difficult with the last two conditions.

reportAggregateDF['PORT_WEIGHT'] = reportAggregateDF['SEC_WEIGHT_RATE']
          [(reportAggregateDF['PORT_ID'] == portID) &
           (reportAggregateDF['SEC_ID'] == 0) &
           (reportAggregateDF['GROUP_LIST'] == " ") & 
           (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[:]) & 
           (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[:])].sum()

Obviously the .ix[:] in the last two conditions is not doing anything for me, but is there a way to make the sum conditional on the row that I am on without looping? My goal is to not do any loops, but instead use purely vector operations.

268

asked Jun 13 '12 10:06

Julio Guzman

2 Answers

You want to use the apply function and a lambda:

>> df
     A    B    C    D     E
0  mitfx  0  200  300  0.25
1     gs  1  150  320  0.35
2    duk  1    5    2  0.45
3    bmo  1  145   65  0.65

Let's say I want to sum column C times E but only if column B == 1 and D is greater than 5:

df['matches'] = df.apply(lambda x: x['C'] * x['E'] if x['B'] == 1 and x['D'] > 5 else 0, axis=1)
df.matches.sum()

It might be cleaner to split this into two steps:

df_subset = df[(df.B == 1) & (df.D > 5)]
df_subset.apply(lambda x: x.C * x.E, axis=1).sum()

or to use simply multiplication for speed:

df_subset = df[(df.B == 1) & (df.D > 5)]
print sum(df_subset.C * df_subset.E)

You are absolutely right to want to do this problem without loops.

answered Oct 27 '22 00:10

guyrt

I'm sure there is a better way, but this did it in a loop:

for idx, eachRecord in reportAggregateDF.T.iteritems():
reportAggregateDF['PORT_WEIGHT'].ix[idx] = reportAggregateDF['SEC_WEIGHT_RATE'][(reportAggregateDF['PORT_ID'] == portID) &            
    (reportAggregateDF['SEC_ID'] == 0) &            
    (reportAggregateDF['GROUP_LIST'] == " ") &             
    (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[idx]) &             
    (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[idx])].sum()

answered Oct 26 '22 23:10

Julio Guzman

Related questions
                            
                                When invoking a Python script, what is the difference between "./script.py" and "python script.py"
                            
                                How to change path to site-packages dir in easy_install
                            
                                RESTful Authentication with Tornado
                            
                                Finding shortest combinations in array/sequence that equals sum
                            
                                How to remove the boundary effects arising due to zero padding in scipy/numpy fft?
                            
                                Python -subtraction of value within a list
                            
                                Run Jython and Python in one File
                            
                                Memory Error when calculating pairwise distances in scipy
                            
                                What's the most memory efficient way to generate the combinations of a set in python?
                            
                                Pyserial buffer fills faster than I can read
                            
                                Change icon in a Matplotlib figure window
                            
                                Why would TCP packets consistently fail to arrive at the server when the client happily sent them?
                            
                                Is it possible to read multiple result sets using a ResultProxy object in sqlalchemy?
                            
                                How to handle multi-page images in PythonMagick?
                            
                                Installing `numpy` within a virtualenv in Ubuntu 11.10
                            
                                Program works in IDLE, but fails at the command line
                            
                                Can one iterate a dictionary being modified by another thread without raising exceptions?
                            
                                Python Flask + nginx fcgi - output large response?
                            
                                Python Class Inheritance AttributeError - why? how to fix?
                            
                                Using scipy to perform discrete integration of the sample

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create Excel-like SUMIFS in Pandas

Tags:

python

pandas

Julio Guzman

People also ask

2 Answers

guyrt

Julio Guzman

Recent Activity

Donate For Us