Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas apply but only for rows where a condition is met

I would like to use Pandas df.apply but only for certain rows

As an example, I want to do something like this, but my actual issue is a little more complicated:

import pandas as pd import math z = pd.DataFrame({'a':[4.0,5.0,6.0,7.0,8.0],'b':[6.0,0,5.0,0,1.0]}) z.where(z['b'] != 0, z['a'] / z['b'].apply(lambda l: math.log(l)), 0) 

What I want in this example is the value in 'a' divided by the log of the value in 'b' for each row, and for rows where 'b' is 0, I simply want to return 0.

like image 439
mgoldwasser Avatar asked Nov 18 '15 00:11

mgoldwasser


People also ask

How do I apply a function to a column in pandas based on condition?

Use DataFrame. apply() to Apply the if-else Condition in a Pandas DataFrame in Python. The apply() method uses the data frame's axis (row or column) to apply a function. We can make our defined function that consists of if-else conditions and apply it to the Pandas dataframe.

Is Iterrows faster than apply?

This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes. See below for an example of how we could use apply for labeling the species in each row.


1 Answers

The other answers are excellent, but I thought I'd add one other approach that can be faster in some circumstances – using broadcasting and masking to achieve the same result:

import numpy as np  mask = (z['b'] != 0) z_valid = z[mask]  z['c'] = 0 z.loc[mask, 'c'] = z_valid['a'] / np.log(z_valid['b']) 

Especially with very large dataframes, this approach will generally be faster than solutions based on apply().

like image 141
jakevdp Avatar answered Oct 03 '22 08:10

jakevdp