Given the following table
vals
0 20
1 3
2 2
3 10
4 20
I'm trying to find a clean solution in pandas to subtract away a value, say 30
for example, to end with the following result.
vals
0 0
1 0
2 0
3 5
4 20
I was wondering if pandas had a solution to performing this that didn't require looping through all the rows in a dataframe, something that takes advantage of pandas's bulk operations.
subtract() function basically perform subtraction of series and other, element-wise (binary operator sub). It is equivalent to series - other , but with support to substitute a fill_value for missing data in one of the inputs.
subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.
Pandas drop_duplicates() method helps in removing duplicates from the data frame.
c = df.vals.cumsum()
m = c.ge(30)
i = m.idxmax()
n = df.vals.where(m, 0)
n.loc[i] = c.loc[i] - 30
df.assign(vals=n)
vals
0 0
1 0
2 0
3 5
4 20
Same thing, but numpy
fied
v = df.vals.values
c = v.cumsum()
m = c >= 30
i = m.argmax()
n = np.where(m, v, 0)
n[i] = c[i] - 30
df.assign(vals=n)
vals
0 0
1 0
2 0
3 5
4 20
Timing
%%timeit
v = df.vals.values
c = v.cumsum()
m = c >= 30
i = m.argmax()
n = np.where(m, v, 0)
n[i] = c[i] - 30
df.assign(vals=n)
10000 loops, best of 3: 168 µs per loop
%%timeit
c = df.vals.cumsum()
m = c.ge(30)
i = m.idxmax()
n = df.vals.where(m, 0)
n.loc[i] = c.loc[i] - 30
df.assign(vals=n)
1000 loops, best of 3: 853 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With