Given the following table <pre class="prettyprint"><code> vals 0 20 1 3 2 2 3 10 4 20 </code></pre> I'm trying to find a clean solution in pandas to subtract away a value, say <code>30</code> for example, to end with the following result. <pre class="prettyprint"><code> vals 0 0 1 0 2 0 3 5 4 20 </code></pre> I was wondering if pandas had a solution to performing this that didn't require looping through all the rows in a dataframe, something that takes advantage of pandas's bulk operations.

<ul> <li>identify where cumsum is greater than or equal to 30</li> <li>mask the rows where it isn't</li> <li>reassign the one row to be the cumsum less 30</li> </ul> <hr> <pre class="prettyprint"><code>c = df.vals.cumsum() m = c.ge(30) i = m.idxmax() n = df.vals.where(m, 0) n.loc[i] = c.loc[i] - 30 df.assign(vals=n) vals 0 0 1 0 2 0 3 5 4 20 </code></pre> <hr> Same thing, but <code>numpy</code>fied <pre class="prettyprint"><code>v = df.vals.values c = v.cumsum() m = c >= 30 i = m.argmax() n = np.where(m, v, 0) n[i] = c[i] - 30 df.assign(vals=n) vals 0 0 1 0 2 0 3 5 4 20 </code></pre> <hr> Timing <pre class="prettyprint"><code>%%timeit v = df.vals.values c = v.cumsum() m = c >= 30 i = m.argmax() n = np.where(m, v, 0) n[i] = c[i] - 30 df.assign(vals=n) 10000 loops, best of 3: 168 µs per loop %%timeit c = df.vals.cumsum() m = c.ge(30) i = m.idxmax() n = df.vals.where(m, 0) n.loc[i] = c.loc[i] - 30 df.assign(vals=n) 1000 loops, best of 3: 853 µs per loop </code></pre>

Subtract aggregate from Pandas Series/Dataframe [duplicate]

Tags:

python

pandas

numpy

Given the following table

I'm trying to find a clean solution in pandas to subtract away a value, say 30 for example, to end with the following result.

I was wondering if pandas had a solution to performing this that didn't require looping through all the rows in a dataframe, something that takes advantage of pandas's bulk operations.

960

asked May 18 '17 18:05

jab

1 Answers

identify where cumsum is greater than or equal to 30
mask the rows where it isn't
reassign the one row to be the cumsum less 30

c = df.vals.cumsum()
m = c.ge(30)
i = m.idxmax()
n = df.vals.where(m, 0)
n.loc[i] = c.loc[i] - 30
df.assign(vals=n)

   vals
0     0
1     0
2     0
3     5
4    20

Same thing, but numpyfied

v = df.vals.values
c = v.cumsum()
m = c >= 30
i = m.argmax()
n = np.where(m, v, 0)
n[i] = c[i] - 30
df.assign(vals=n)

   vals
0     0
1     0
2     0
3     5
4    20

Timing

%%timeit 
v = df.vals.values
c = v.cumsum()
m = c >= 30
i = m.argmax()
n = np.where(m, v, 0)
n[i] = c[i] - 30
df.assign(vals=n)
10000 loops, best of 3: 168 µs per loop

%%timeit
c = df.vals.cumsum()
m = c.ge(30)
i = m.idxmax()
n = df.vals.where(m, 0)
n.loc[i] = c.loc[i] - 30
df.assign(vals=n)
1000 loops, best of 3: 853 µs per loop

186

answered Sep 29 '22 19:09

piRSquared

Related questions
                            
                                Keep values of between two columns based on third column in pandas
                            
                                adding filter to pandas pivot table
                            
                                Is there a way to use something equivalent to json.dumps in javascript
                            
                                Scikit learn (Python 3.5): Do I need to import a library to make this work?
                            
                                Efficent way of constructing a matrix with all elements zero except one in numpy
                            
                                Remove None from tuple
                            
                                How to replace placeholders in python strings
                            
                                Google foobar python: failure on two tests - lovely lucky lambs (counting of sequences)
                            
                                Django ContextMixin 'super' object has no attribute 'get_context_data'
                            
                                How to access top five Google result links using Beautifulsoup
                            
                                Python Matplotlib Box Plot Two Data Sets Side by Side
                            
                                run Cython in Jupyter cdef
                            
                                Correctly centring text (PIL/Pillow)
                            
                                Python list comprehension with a function as the output and the conditional
                            
                                How to merge two nested dict in python?
                            
                                Vue app doesn't load when served through Python Flask server
                            
                                pandas to_datetime() then concat() on DateTime Index
                            
                                Sort Python Dictionary by Absolute Value of Values
                            
                                Python matplotlib, get position of xtick labels
                            
                                How to increment variable every time script is run in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With