I would like to use Pandas df.apply
but only for certain rows
As an example, I want to do something like this, but my actual issue is a little more complicated:
import pandas as pd import math z = pd.DataFrame({'a':[4.0,5.0,6.0,7.0,8.0],'b':[6.0,0,5.0,0,1.0]}) z.where(z['b'] != 0, z['a'] / z['b'].apply(lambda l: math.log(l)), 0)
What I want in this example is the value in 'a' divided by the log of the value in 'b' for each row, and for rows where 'b' is 0, I simply want to return 0.
Use DataFrame. apply() to Apply the if-else Condition in a Pandas DataFrame in Python. The apply() method uses the data frame's axis (row or column) to apply a function. We can make our defined function that consists of if-else conditions and apply it to the Pandas dataframe.
This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes. See below for an example of how we could use apply for labeling the species in each row.
The other answers are excellent, but I thought I'd add one other approach that can be faster in some circumstances – using broadcasting and masking to achieve the same result:
import numpy as np mask = (z['b'] != 0) z_valid = z[mask] z['c'] = 0 z.loc[mask, 'c'] = z_valid['a'] / np.log(z_valid['b'])
Especially with very large dataframes, this approach will generally be faster than solutions based on apply()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With