I have a pandas data frame, sample
, with one of the columns called PR
to which am applying a lambda function as follows:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
I then get the following syntax error message:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90) ^ SyntaxError: invalid syntax
What am I doing wrong?
We can do this with the apply() function in Pandas. We can use the apply() function to apply the lambda function to both rows and columns of a dataframe. If the axis argument in the apply() function is 0, then the lambda function gets applied to each column, and if 1, then the function gets applied to each row.
You can apply the lambda expression for a single column in the DataFrame. The following example subtracts every cell value by 2 for column A – df["A"]=df["A"]. apply(lambda x:x-2) .
Pandas Apply Function to Single Column We will create a function add_3() which adds value 3 column value and use this on apply() function. To apply it to a single column, qualify the column name using df["col_name"] . The below example applies a function to a column B .
Syntax. Simply put, a lambda function is just like any normal python function, except that it has no name when defining it, and it is contained in one line of code. A lambda function evaluates an expression for a given argument. You give the function a value (argument) and then provide the operation (expression).
You need mask
:
sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
Another solution with loc
and boolean indexing
:
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
Sample:
import pandas as pd import numpy as np sample = pd.DataFrame({'PR':[10,100,40] }) print (sample) PR 0 10 1 100 2 40 sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan) print (sample) PR 0 NaN 1 100.0 2 NaN
sample.loc[sample['PR'] < 90, 'PR'] = np.nan print (sample) PR 0 NaN 1 100.0 2 NaN
EDIT:
Solution with apply
:
sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)
Timings len(df)=300k
:
sample = pd.concat([sample]*100000).reset_index(drop=True) In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x) 10 loops, best of 3: 102 ms per loop In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan) The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached. 100 loops, best of 3: 3.71 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With