I have a pandas data frame, <code>sample</code>, with one of the columns called <code>PR</code> to which am applying a lambda function as follows: <pre class="prettyprint"><code>sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90) </code></pre> I then get the following syntax error message: <pre class="prettyprint"><code>sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90) ^ SyntaxError: invalid syntax </code></pre> What am I doing wrong?

You need <code>mask</code>: <pre class="prettyprint"><code>sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan) </code></pre> Another solution with <code>loc</code> and <code>boolean indexing</code>: <pre class="prettyprint"><code>sample.loc[sample['PR'] < 90, 'PR'] = np.nan </code></pre> Sample: <pre class="prettyprint"><code>import pandas as pd import numpy as np sample = pd.DataFrame({'PR':[10,100,40] }) print (sample) PR 0 10 1 100 2 40 sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan) print (sample) PR 0 NaN 1 100.0 2 NaN </code></pre> <pre class="prettyprint"><code>sample.loc[sample['PR'] < 90, 'PR'] = np.nan print (sample) PR 0 NaN 1 100.0 2 NaN </code></pre> EDIT: Solution with <code>apply</code>: <pre class="prettyprint"><code>sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x) </code></pre> Timings <code>len(df)=300k</code>: <pre class="prettyprint"><code>sample = pd.concat([sample]*100000).reset_index(drop=True) In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x) 10 loops, best of 3: 102 ms per loop In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan) The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached. 100 loops, best of 3: 3.71 ms per loop </code></pre>

How to properly apply a lambda function into a pandas data frame column

Tags:

pandas

lambda

I have a pandas data frame, sample, with one of the columns called PR to which am applying a lambda function as follows:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)

I then get the following syntax error message:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)                                                          ^ SyntaxError: invalid syntax

What am I doing wrong?

898

asked May 25 '16 05:05

Amani

1 Answers

You need mask:

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)

Another solution with loc and boolean indexing:

sample.loc[sample['PR'] < 90, 'PR'] = np.nan

Sample:

import pandas as pd import numpy as np  sample = pd.DataFrame({'PR':[10,100,40] }) print (sample)     PR 0   10 1  100 2   40  sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan) print (sample)       PR 0    NaN 1  100.0 2    NaN

sample.loc[sample['PR'] < 90, 'PR'] = np.nan print (sample)       PR 0    NaN 1  100.0 2    NaN

EDIT:

Solution with apply:

sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)

Timings len(df)=300k:

sample = pd.concat([sample]*100000).reset_index(drop=True)  In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x) 10 loops, best of 3: 102 ms per loop  In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan) The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached. 100 loops, best of 3: 3.71 ms per loop

142

answered Sep 29 '22 04:09

jezrael

Related questions
                            
                                Count and Sort with Pandas
                            
                                Fastest Way to Drop Duplicated Index in a Pandas DataFrame [duplicate]
                            
                                How to delete the last column of data of a pandas dataframe
                            
                                pandas merge dataframe with NaN (or "unknown") for missing values
                            
                                How do pandas Rolling objects work?
                            
                                Replicating rows in a pandas data frame by a column value
                            
                                Pandas groupby with categories with redundant nan
                            
                                Disabling Pylint no member- E1101 error for specific libraries
                            
                                How can I subclass a Pandas DataFrame?
                            
                                Retrieve name of column from its Index in Pandas
                            
                                How to get tfidf with pandas dataframe?
                            
                                Python pandas empty correlation matrix
                            
                                Compare Python Pandas DataFrames for matching rows
                            
                                Using pyarrow how do you append to parquet file?
                            
                                View dataframe while debugging in VS Code
                            
                                how set column as date index?
                            
                                Remove rows in python less than a certain value
                            
                                Create single row python pandas dataframe
                            
                                How can I "merge" rows by same value in a column in Pandas with aggregation functions?
                            
                                Slice Pandas DataFrame by Row

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With