Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly apply a lambda function into a pandas data frame column

Tags:

pandas

lambda

I have a pandas data frame, sample, with one of the columns called PR to which am applying a lambda function as follows:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90) 

I then get the following syntax error message:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)                                                          ^ SyntaxError: invalid syntax 

What am I doing wrong?

like image 898
Amani Avatar asked May 25 '16 05:05

Amani


People also ask

How do I apply a lambda function to a column in pandas?

We can do this with the apply() function in Pandas. We can use the apply() function to apply the lambda function to both rows and columns of a dataframe. If the axis argument in the apply() function is 0, then the lambda function gets applied to each column, and if 1, then the function gets applied to each row.

How do you use lambda in a data frame?

You can apply the lambda expression for a single column in the DataFrame. The following example subtracts every cell value by 2 for column A – df["A"]=df["A"]. apply(lambda x:x-2) .

How do I apply a function to a column in pandas?

Pandas Apply Function to Single Column We will create a function add_3() which adds value 3 column value and use this on apply() function. To apply it to a single column, qualify the column name using df["col_name"] . The below example applies a function to a column B .

What is the correct way to use a lambda function?

Syntax. Simply put, a lambda function is just like any normal python function, except that it has no name when defining it, and it is contained in one line of code. A lambda function evaluates an expression for a given argument. You give the function a value (argument) and then provide the operation (expression).


1 Answers

You need mask:

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan) 

Another solution with loc and boolean indexing:

sample.loc[sample['PR'] < 90, 'PR'] = np.nan 

Sample:

import pandas as pd import numpy as np  sample = pd.DataFrame({'PR':[10,100,40] }) print (sample)     PR 0   10 1  100 2   40  sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan) print (sample)       PR 0    NaN 1  100.0 2    NaN 
sample.loc[sample['PR'] < 90, 'PR'] = np.nan print (sample)       PR 0    NaN 1  100.0 2    NaN 

EDIT:

Solution with apply:

sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x) 

Timings len(df)=300k:

sample = pd.concat([sample]*100000).reset_index(drop=True)  In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x) 10 loops, best of 3: 102 ms per loop  In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan) The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached. 100 loops, best of 3: 3.71 ms per loop 
like image 142
jezrael Avatar answered Sep 29 '22 04:09

jezrael