Suppose we have this simple pandas.DataFrame:
import pandas as pd
df = pd.DataFrame(
columns=['quantity', 'value'],
data=[[1, 12.5], [3, 18.0]]
)
>>> print(df)
quantity value
0 1 12.5
1 3 18.0
I would like to create a new column, say modified_value
, that applies a function N times to the value
column, N being the quantity
column.
Suppose that function is new_value = round(value/2, 1)
, the expected result would be:
quantity value modified_value
0 1 12.5 6.2 # applied 1 time
1 3 9.0 1.1 # applied 3 times, 9.0 -> 4.5 -> 2.2 -> 1.1
What would be an elegant/vectorized way to do so?
By using apply() you call a function to every row of pandas DataFrame. Here the add() function will be applied to every row of pandas DataFrame. In order to iterate row by row in apply() function use axis=1 .
Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.
The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).
You can write a custom repeat
function, then use apply:
def repeat(func, x, n):
ret = x
for i in range(int(n)):
ret = func(ret)
return ret
def my_func(val): return round(val/2, 1)
df['new_col'] = df.apply(lambda x: repeat(my_func, x['value'], x['quantity']),
axis=1)
# or without apply
# df['new_col'] = [repeat(my_func, v, n) for v,n in zip(df['value'], df['quantity'])]
Use reduce
:
from functools import reduce
def repeated(f, n):
def rfun(p):
return reduce(lambda x, _: f(x), range(n), p)
return rfun
def myfunc(value): return round(value/2, 1)
df['modified_valued'] = df.apply(lambda x: repeated(myfunc,
int(x['quantity']))(x['value']),
axis=1)
We can also use list comprehension instead apply
df['modified_valued'] = [repeated(myfunc, int(quantity))(value)
for quantity, value in zip (df['quantity'], df['value'])]
Output
quantity value modified_valued
0 1 12.5 6.2
1 3 18.0 2.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With