Is there a way in pandas to apply a function to a dataframe using the column names as argument names? For example, I have a function and a dataframe.
df = pd.DataFrame({'A':[1,2,3],
'B':[1,2,3],
'C':[1,2,3],
'D':[1,2,3]})
def f(A,B,C):
#Pretend code is more complicated
return A + B + C
Is there a way I can do something like
df.apply(f)
and have pandas match the columns to named arguments?
I know I can rewrite the function to take a row instead of named arguments, but keep in mind that f is just a toy example and my real function is more complicated
EDIT:
Figured it out based @juanpa.arrivillaga answer:
df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)
We can use apply() function on a column of a DataFrame with lambda expression.
Information can be passed into functions as arguments. Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a comma.
Use apply() to Apply Functions to Columns in Pandas The apply() method allows to apply a function for a whole DataFrame, either across columns or rows. We set the parameter axis as 0 for rows and 1 for columns. The new appended e column is the sum of data in column a and b .
The function to apply f
needs to accept either rows/columns, depending on axis=0,1
, of df
as an argument, not the column name. You can write a wrapper for this purpose.
def wrapper(x, A, B, C):
return f(x[A], x[B], x[C])
df.apply(wrapper, axis=1, args=('A','B','C'))
Output:
0 3
1 6
2 9
dtype: int64
if you are interesting for "apply" function, here is the case
df = pd.DataFrame({'A':[1,2,3],
'B':[1,2,3],
'C':[1,2,3],
'D':[1,2,3]})
def func(row):
row['result'] = row['A'] + row['B'] + row['C']
return row
df.apply(func, axis = 1)
Out[67]:
A B C D result
0 1 1 1 1 3
1 2 2 2 2 6
2 3 3 3 3 9
If you have to use function "f" and don't want to change it, may be this:
df['res'] = f(df['A'], df['B'], df['C'])
df
Out[70]:
A B C D res
0 1 1 1 1 3
1 2 2 2 2 6
2 3 3 3 3 9
There is no good way in general. However, if your column names alight exactly you can wrap the function in another function that splats the row argument into your function, because Series
objects are mappings!
So given:
>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3],
... 'B':[1,2,3],
... 'C':[1,2,3],
... 'D':[1,2,3]})
>>> df
A B C D
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
>>> def f(A, B, C): return A + B + C
...
We could almost do:
>>> df.apply(lambda row: f(**row), axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/frame.py", line 6014, in apply
return op.get_result()
File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 142, in get_result
return self.apply_standard()
File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 248, in apply_standard
self.apply_series_generator()
File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
results[i] = self.f(v)
File "<stdin>", line 1, in <lambda>
TypeError: ("f() got an unexpected keyword argument 'D'", 'occurred at index 0')
If you know what the columns you need, you can select/drop to get the correct series:
>>> df.drop('D',axis=1).apply(lambda row: f(**row), axis=1)
0 3
1 6
2 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With