Is there a way in pandas to apply a function to a dataframe using the column names as argument names? For example, I have a function and a dataframe. <pre class="prettyprint"><code>df = pd.DataFrame({'A':[1,2,3], 'B':[1,2,3], 'C':[1,2,3], 'D':[1,2,3]}) def f(A,B,C): #Pretend code is more complicated return A + B + C </code></pre> Is there a way I can do something like <pre class="prettyprint"><code>df.apply(f) </code></pre> and have pandas match the columns to named arguments? I know I can rewrite the function to take a row instead of named arguments, but keep in mind that f is just a toy example and my real function is more complicated EDIT: Figured it out based @juanpa.arrivillaga answer: <code>df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)</code>

The function to apply <code>f</code> needs to accept either rows/columns, depending on <code>axis=0,1</code>, of <code>df</code> as an argument, not the column name. You can write a wrapper for this purpose. <pre class="prettyprint"><code>def wrapper(x, A, B, C): return f(x[A], x[B], x[C]) df.apply(wrapper, axis=1, args=('A','B','C')) </code></pre> Output: <pre class="prettyprint"><code>0 3 1 6 2 9 dtype: int64 </code></pre>

Python Pandas: Apply function using column names as named arguments

Tags:

python

pandas

dataframe

apply

Is there a way in pandas to apply a function to a dataframe using the column names as argument names? For example, I have a function and a dataframe.

df = pd.DataFrame({'A':[1,2,3],
               'B':[1,2,3],
               'C':[1,2,3],
               'D':[1,2,3]})    
def f(A,B,C):
   #Pretend code is more complicated
   return A + B + C

Is there a way I can do something like

df.apply(f)

and have pandas match the columns to named arguments?

I know I can rewrite the function to take a row instead of named arguments, but keep in mind that f is just a toy example and my real function is more complicated

EDIT:

Figured it out based @juanpa.arrivillaga answer:

df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)

687

asked Oct 18 '19 16:10

Jack

3 Answers

The function to apply f needs to accept either rows/columns, depending on axis=0,1, of df as an argument, not the column name. You can write a wrapper for this purpose.

def wrapper(x, A, B, C):
    return f(x[A], x[B], x[C])

df.apply(wrapper, axis=1, args=('A','B','C'))

Output:

0    3
1    6
2    9
dtype: int64

130

answered Oct 20 '22 09:10

Quang Hoang

if you are interesting for "apply" function, here is the case

df = pd.DataFrame({'A':[1,2,3],
                  'B':[1,2,3],
                  'C':[1,2,3],
                  'D':[1,2,3]})     


def func(row):
    row['result'] = row['A'] + row['B'] + row['C']
    return row

df.apply(func, axis = 1)


    Out[67]: 
       A  B  C  D  result
    0  1  1  1  1       3
    1  2  2  2  2       6
    2  3  3  3  3       9

UPD

If you have to use function "f" and don't want to change it, may be this:

df['res'] = f(df['A'], df['B'], df['C'])
df

    Out[70]: 
       A  B  C  D  res
    0  1  1  1  1    3
    1  2  2  2  2    6
    2  3  3  3  3    9

answered Oct 20 '22 09:10

Alex

There is no good way in general. However, if your column names alight exactly you can wrap the function in another function that splats the row argument into your function, because Series objects are mappings!

So given:

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3],
...                'B':[1,2,3],
...                'C':[1,2,3],
...                'D':[1,2,3]})
>>> df
   A  B  C  D
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3
>>> def f(A, B, C): return A + B + C
...

We could almost do:

>>> df.apply(lambda row: f(**row), axis=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<stdin>", line 1, in <lambda>
TypeError: ("f() got an unexpected keyword argument 'D'", 'occurred at index 0')

If you know what the columns you need, you can select/drop to get the correct series:

>>> df.drop('D',axis=1).apply(lambda row: f(**row), axis=1)
0    3
1    6
2    9

answered Oct 20 '22 10:10

juanpa.arrivillaga

Related questions
                            
                                Tensorflow 2.0 , replace 0 values in a tensor with 1s
                            
                                RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
                            
                                json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) python
                            
                                How to process and extract text from image
                            
                                How to use correct cmap colors in nested pie chart in matplotlib
                            
                                Pandas: Sum Previous N Rows by Group
                            
                                Faster alternative to apache airflow for workflows with many tasks
                            
                                Iterating over xarray. DataArray first dimension and its coordinates
                            
                                How to access sample weights in a Keras custom loss function supplied by a generator?
                            
                                Pandas groupby ewm
                            
                                display data on real map based on postal code
                            
                                How to save 2D float numpy arrays losslessly into a grayscale image while preserving resolution?
                            
                                Gensim Word2Vec model getting worse by increasing the number of epochs
                            
                                Increase width of a specific column while converting pandas Dataframes to PDF
                            
                                How to annotate attribute that can be implemented as property?
                            
                                Dataframe fillna conditional based on Index & Column Name
                            
                                Can I use pybind11 to pass a numpy array to a function accepting a Eigen::Tensor?
                            
                                Pandas and Jupyter not found after upgrading to Catalina
                            
                                How to fix ‘“ERROR: Command errored out with exit status 1:” when trying to install watchdog using pip
                            
                                How to get unique values of a dataframe column when there are lists - python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With