I am trying to use apply to avoid an <code>iterrows()</code> iterator in a function: However that pandas method is poorly documented and I can't find example on how to use it, except for the lame <code>.apply(sq.rt)</code> in the documentation... No example on how to use arguments etc... Anyway, here a toy example on what I try to do. In my understanding <code>apply</code> will actually do the same as <code>iterrows()</code>, ie, iterate (over the rows if axis=0). On each iteration the input <code>x</code> of the function should be the row iterated over. However the error messages I keep receiving sort of disprove that assumption... <pre class="prettyprint"><code>grid = np.random.rand(5,2) df = pd.DataFrame(grid) def multiply(x): x[3]=x[0]*x[1] df = df.apply(multiply, axis=0) </code></pre> The example above returns an empty df. Can anyone shed some light on my misunderstanding?

It should be noted that you can use lambda functions as well. See their documentation Apply For your example, you can run: <pre class="prettyprint"><code>df['multiply'] = df.apply(lambda row: row[0] * row[1], axis = 1) </code></pre> which produces the same output as @Andy This can be useful if your function is in the form of <pre class="prettyprint"><code>def multiply(a,b): return a*b df['multiply'] = df.apply(lambda row: multiply(row[0] ,row[1]), axis = 1) </code></pre> More examples in the section Enhancing Performance

When <code>apply</code>-ing a function, you need that function to return the result for that operation over the column/row. You are getting <code>None</code> because <code>multiply</code> doesn't return, evidently. That is, <code>apply</code> should be returning a result between particular values, not doing the assignment itself. You're also iterating over the wrong axis, here. Your current code takes the first and second element of each column and multiplies them together. A correct <code>multiply</code> function: <pre class="prettyprint"><code>def multiply(x): return x[0]*x[1] df[3] = df.apply(multiply, 'columns') </code></pre> With that being said, you can do much better than <code>apply</code> here, as it is not a vectorized operation. Just multiply the columns together directly. <pre class="prettyprint"><code>df[3] = df[0]*df[1] </code></pre> In general, you should avoid <code>apply</code> when possible as it is not much more than a loop itself under the hood.

One of the rules of Pandas Zen says: <code>always try to find a vectorized solution first</code>. <code>.apply(..., axis=1)</code> is not vectorized! Consider alternatives: <pre class="prettyprint"><code>In [164]: df.prod(axis=1) Out[164]: 0 0.770675 1 0.539782 2 0.318027 3 0.597172 4 0.211643 dtype: float64 In [165]: df[0] * df[1] Out[165]: 0 0.770675 1 0.539782 2 0.318027 3 0.597172 4 0.211643 dtype: float64 </code></pre> Timing against 50.000 rows DF: <pre class="prettyprint"><code>In [166]: df = pd.concat([df] * 10**4, ignore_index=True) In [167]: df.shape Out[167]: (50000, 2) In [168]: %timeit df.apply(multiply, axis=1) 1 loop, best of 3: 6.12 s per loop In [169]: %timeit df.prod(axis=1) 100 loops, best of 3: 6.23 ms per loop In [170]: def multiply_vect(x1, x2): ...: return x1*x2 ...: In [171]: %timeit multiply_vect(df[0], df[1]) 1000 loops, best of 3: 604 µs per loop </code></pre> Conclusion: use <code>.apply()</code> as a very last resort (i.e. when nothing else helps)

Python Pandas, apply function

Tags:

python

pandas

I am trying to use apply to avoid an iterrows() iterator in a function:

However that pandas method is poorly documented and I can't find example on how to use it, except for the lame .apply(sq.rt) in the documentation... No example on how to use arguments etc...

Anyway, here a toy example on what I try to do.

In my understanding apply will actually do the same as iterrows(), ie, iterate (over the rows if axis=0). On each iteration the input x of the function should be the row iterated over. However the error messages I keep receiving sort of disprove that assumption...

grid = np.random.rand(5,2)
df = pd.DataFrame(grid)

def multiply(x):
    x[3]=x[0]*x[1]

df = df.apply(multiply, axis=0)

The example above returns an empty df. Can anyone shed some light on my misunderstanding?

238

asked Apr 18 '17 19:04

jim jarnac

4 Answers

import pandas as pd
import numpy as np

grid = np.random.rand(5,2)
df = pd.DataFrame(grid)

def multiply(x):
    return x[0]*x[1]

df['multiply'] = df.apply(multiply, axis = 1)
print(df)

Results in:

          0         1  multiply
0  0.550750  0.713054  0.392715
1  0.061949  0.661614  0.040987
2  0.472134  0.783479  0.369907
3  0.827371  0.277591  0.229670
4  0.961102  0.137510  0.132162

Explanation:

The function you are applying, needs to return a value. You are also applying this to each row, not column. The axis parameter you passed was incorrect in this regard.

Finally, notice that I am setting this equal to the 'multiply' column outside of my function. You can easily change this to be df[3] = ... like you have and get a dataframe like this:

          0         1         3
0  0.550750  0.713054  0.392715
1  0.061949  0.661614  0.040987
2  0.472134  0.783479  0.369907
3  0.827371  0.277591  0.229670
4  0.961102  0.137510  0.132162

answered Oct 04 '22 05:10

Andy

It should be noted that you can use lambda functions as well. See their documentation Apply

For your example, you can run:

df['multiply'] = df.apply(lambda row: row[0] * row[1], axis = 1)

which produces the same output as @Andy

This can be useful if your function is in the form of

def multiply(a,b):
    return a*b

df['multiply'] = df.apply(lambda row: multiply(row[0] ,row[1]), axis = 1)

More examples in the section Enhancing Performance

answered Oct 04 '22 05:10

Jon

When apply-ing a function, you need that function to return the result for that operation over the column/row. You are getting None because multiply doesn't return, evidently. That is, apply should be returning a result between particular values, not doing the assignment itself.

You're also iterating over the wrong axis, here. Your current code takes the first and second element of each column and multiplies them together.

A correct multiply function:

def multiply(x):
    return x[0]*x[1]

df[3] = df.apply(multiply, 'columns')

With that being said, you can do much better than apply here, as it is not a vectorized operation. Just multiply the columns together directly.

df[3] = df[0]*df[1]

In general, you should avoid apply when possible as it is not much more than a loop itself under the hood.

answered Oct 04 '22 06:10

miradulo

One of the rules of Pandas Zen says: always try to find a vectorized solution first.

.apply(..., axis=1) is not vectorized!

Consider alternatives:

In [164]: df.prod(axis=1)
Out[164]:
0    0.770675
1    0.539782
2    0.318027
3    0.597172
4    0.211643
dtype: float64

In [165]: df[0] * df[1]
Out[165]:
0    0.770675
1    0.539782
2    0.318027
3    0.597172
4    0.211643
dtype: float64

Timing against 50.000 rows DF:

In [166]: df = pd.concat([df] * 10**4, ignore_index=True)

In [167]: df.shape
Out[167]: (50000, 2)

In [168]: %timeit df.apply(multiply, axis=1)
1 loop, best of 3: 6.12 s per loop

In [169]: %timeit df.prod(axis=1)
100 loops, best of 3: 6.23 ms per loop

In [170]: def multiply_vect(x1, x2):
     ...:     return x1*x2
     ...:

In [171]: %timeit multiply_vect(df[0], df[1])
1000 loops, best of 3: 604 µs per loop

Conclusion: use .apply() as a very last resort (i.e. when nothing else helps)

answered Oct 04 '22 05:10

MaxU - stop WAR against UA

Related questions
                            
                                I am trying to fill all NaN values in rows with number data types to zero in pandas
                            
                                MXNet print intermediate symbol values
                            
                                Tensor multiplication in Tensorflow
                            
                                EOF Error in python Hackerrank
                            
                                Tensorflow Variables are Not Initialized using Between-graph Replication
                            
                                What is the purpose of the IPython.display.display_markdown() function?
                            
                                OSError: no such file or directory on using subprocess.Popen [duplicate]
                            
                                Compute weighted sums on rolling window with pandas dataframes of different length
                            
                                Using a list of conditions to filter a DataFrame in Pandas
                            
                                How do I switch rows and columns in a 2D array?
                            
                                How to make trapezoid and parallelogram in python using matplotlib
                            
                                How does unpacking in fig, ax = plt.subplots() work for more than one subplot?
                            
                                Proper way to make a call to an Endpoint from the API using Flask
                            
                                How to specify gradient parameter to folium heatmap?
                            
                                Multiple arguments with values to custom management command
                            
                                Different types of POST requests in the same route in Flask
                            
                                How to save a greyscale matplotlib plot to numpy array
                            
                                Set Unicode filename in Flask response header
                            
                                generating batch of clones from image numpy
                            
                                2-dimensional binning with Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas, apply function

Tags:

python

pandas

jim jarnac

People also ask

4 Answers

Andy

Jon

miradulo

MaxU - stop WAR against UA

Recent Activity

Donate For Us