Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Apply function using column names as named arguments

Is there a way in pandas to apply a function to a dataframe using the column names as argument names? For example, I have a function and a dataframe.

df = pd.DataFrame({'A':[1,2,3],
               'B':[1,2,3],
               'C':[1,2,3],
               'D':[1,2,3]})    
def f(A,B,C):
   #Pretend code is more complicated
   return A + B + C

Is there a way I can do something like

df.apply(f)

and have pandas match the columns to named arguments?

I know I can rewrite the function to take a row instead of named arguments, but keep in mind that f is just a toy example and my real function is more complicated

EDIT:

Figured it out based @juanpa.arrivillaga answer:

df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)

like image 687
Jack Avatar asked Oct 18 '19 16:10

Jack


People also ask

Can I apply a function to a column in pandas?

We can use apply() function on a column of a DataFrame with lambda expression.

How do you pass arguments to a function in Python?

Information can be passed into functions as arguments. Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a comma.

How do I apply a function to all columns in pandas?

Use apply() to Apply Functions to Columns in Pandas The apply() method allows to apply a function for a whole DataFrame, either across columns or rows. We set the parameter axis as 0 for rows and 1 for columns. The new appended e column is the sum of data in column a and b .


3 Answers

The function to apply f needs to accept either rows/columns, depending on axis=0,1, of df as an argument, not the column name. You can write a wrapper for this purpose.

def wrapper(x, A, B, C):
    return f(x[A], x[B], x[C])

df.apply(wrapper, axis=1, args=('A','B','C'))

Output:

0    3
1    6
2    9
dtype: int64
like image 130
Quang Hoang Avatar answered Oct 20 '22 09:10

Quang Hoang


if you are interesting for "apply" function, here is the case

df = pd.DataFrame({'A':[1,2,3],
                  'B':[1,2,3],
                  'C':[1,2,3],
                  'D':[1,2,3]})     


def func(row):
    row['result'] = row['A'] + row['B'] + row['C']
    return row

df.apply(func, axis = 1)


    Out[67]: 
       A  B  C  D  result
    0  1  1  1  1       3
    1  2  2  2  2       6
    2  3  3  3  3       9

UPD

If you have to use function "f" and don't want to change it, may be this:

df['res'] = f(df['A'], df['B'], df['C'])
df

    Out[70]: 
       A  B  C  D  res
    0  1  1  1  1    3
    1  2  2  2  2    6
    2  3  3  3  3    9
like image 2
Alex Avatar answered Oct 20 '22 09:10

Alex


There is no good way in general. However, if your column names alight exactly you can wrap the function in another function that splats the row argument into your function, because Series objects are mappings!

So given:

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3],
...                'B':[1,2,3],
...                'C':[1,2,3],
...                'D':[1,2,3]})
>>> df
   A  B  C  D
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3
>>> def f(A, B, C): return A + B + C
...

We could almost do:

>>> df.apply(lambda row: f(**row), axis=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<stdin>", line 1, in <lambda>
TypeError: ("f() got an unexpected keyword argument 'D'", 'occurred at index 0')

If you know what the columns you need, you can select/drop to get the correct series:

>>> df.drop('D',axis=1).apply(lambda row: f(**row), axis=1)
0    3
1    6
2    9
like image 1
juanpa.arrivillaga Avatar answered Oct 20 '22 10:10

juanpa.arrivillaga