Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying/Composing a function N times to a pandas column, N being different for each row

Tags:

python

pandas

Suppose we have this simple pandas.DataFrame:

import pandas as pd

df = pd.DataFrame(
  columns=['quantity', 'value'],
  data=[[1, 12.5], [3, 18.0]]
)

>>> print(df)
   quantity  value
0         1   12.5
1         3   18.0

I would like to create a new column, say modified_value, that applies a function N times to the value column, N being the quantity column. Suppose that function is new_value = round(value/2, 1), the expected result would be:

   quantity  value  modified_value
0         1   12.5            6.2   # applied 1 time
1         3   9.0             1.1   # applied 3 times, 9.0 -> 4.5 -> 2.2 -> 1.1

What would be an elegant/vectorized way to do so?

like image 809
pierre_loic Avatar asked Mar 31 '20 14:03

pierre_loic


People also ask

How do you call a function for each row in pandas?

By using apply() you call a function to every row of pandas DataFrame. Here the add() function will be applied to every row of pandas DataFrame. In order to iterate row by row in apply() function use axis=1 .

How does pandas calculate row difference?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.

How do you apply a function to all values in a DataFrame column?

The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).


2 Answers

You can write a custom repeat function, then use apply:

def repeat(func, x, n):
    ret = x
    for i in range(int(n)):
        ret = func(ret)

    return ret

def my_func(val): return round(val/2, 1)

df['new_col'] = df.apply(lambda x: repeat(my_func, x['value'], x['quantity']), 
                         axis=1)

# or without apply
# df['new_col'] = [repeat(my_func, v, n) for v,n in zip(df['value'], df['quantity'])]
like image 156
Quang Hoang Avatar answered Oct 19 '22 18:10

Quang Hoang


Use reduce:

from functools import reduce
def repeated(f, n):
    def rfun(p):
        return reduce(lambda x, _: f(x), range(n), p)
    return rfun

def myfunc(value): return  round(value/2, 1)

df['modified_valued'] = df.apply(lambda x: repeated(myfunc,
                                                    int(x['quantity']))(x['value']),
                                 axis=1)

We can also use list comprehension instead apply

df['modified_valued'] = [repeated(myfunc, int(quantity))(value) 
                         for quantity, value in zip (df['quantity'], df['value'])]

Output

   quantity  value  modified_valued
0         1   12.5              6.2
1         3   18.0              2.2
like image 2
ansev Avatar answered Oct 19 '22 19:10

ansev