I have the below case statement in python, <pre class="prettyprint"><code>pd_df['difficulty'] = 'Unknown' pd_df['difficulty'][(pd_df['Time']<30) & (pd_df['Time']>0)] = 'Easy' pd_df['difficulty'][(pd_df['Time']>=30) & (pd_df['Time']<=60)] = 'Meduim' pd_df['difficulty'][pd_df['Time']>60] = 'Hard' </code></pre> But when I run the code, it throws an error. <pre class="prettyprint"><code>A value is trying to be set on a copy of a slice from a DataFrame </code></pre>

Option 1 For performance, use a nested <code>np.where</code> condition. For the condition, you can just use <code>pd.Series.between</code>, and the default value will be inserted accordingly. <pre class="prettyprint"><code>pd_df['difficulty'] = np.where( pd_df['Time'].between(0, 30, inclusive=False), 'Easy', np.where( pd_df['Time'].between(0, 30, inclusive=False), 'Medium', 'Unknown' ) ) </code></pre> <hr> Option 2 Similarly, using <code>np.select</code>, this gives more room for adding conditions: <pre class="prettyprint"><code>pd_df['difficulty'] = np.select( [ pd_df['Time'].between(0, 30, inclusive=False), pd_df['Time'].between(30, 60, inclusive=True) ], [ 'Easy', 'Medium' ], default='Unknown' ) </code></pre> <hr> Option 3 Another performant solution involves <code>loc</code>: <pre class="prettyprint"><code>pd_df['difficulty'] = 'Unknown' pd_df.loc[pd_df['Time'].between(0, 30, inclusive=False), 'difficulty'] = 'Easy' pd_df.loc[pd_df['Time'].between(30, 60, inclusive=True), 'difficulty'] = 'Medium' </code></pre>

Pandas - Case when & default in pandas

Tags:

python

pandas

I have the below case statement in python,

pd_df['difficulty'] = 'Unknown' pd_df['difficulty'][(pd_df['Time']<30) & (pd_df['Time']>0)] = 'Easy' pd_df['difficulty'][(pd_df['Time']>=30) & (pd_df['Time']<=60)] = 'Meduim' pd_df['difficulty'][pd_df['Time']>60] = 'Hard'

But when I run the code, it throws an error.

A value is trying to be set on a copy of a slice from a DataFrame

652

asked Mar 12 '18 05:03

Tom J Muthirenthi

1 Answers

Option 1
For performance, use a nested np.where condition. For the condition, you can just use pd.Series.between, and the default value will be inserted accordingly.

pd_df['difficulty'] = np.where(      pd_df['Time'].between(0, 30, inclusive=False),      'Easy',       np.where(         pd_df['Time'].between(0, 30, inclusive=False), 'Medium', 'Unknown'      ) )

Option 2
Similarly, using np.select, this gives more room for adding conditions:

pd_df['difficulty'] = np.select(     [         pd_df['Time'].between(0, 30, inclusive=False),          pd_df['Time'].between(30, 60, inclusive=True)     ],      [         'Easy',          'Medium'     ],      default='Unknown' )

Option 3
Another performant solution involves loc:

pd_df['difficulty'] = 'Unknown' pd_df.loc[pd_df['Time'].between(0, 30, inclusive=False), 'difficulty'] = 'Easy' pd_df.loc[pd_df['Time'].between(30, 60, inclusive=True), 'difficulty'] = 'Medium'

150

answered Sep 24 '22 08:09

cs95

Related questions
                            
                                Is there an easy way to check if an object is JSON serializable in python?
                            
                                Converting a python numeric expression to LaTeX
                            
                                How can I do an "if run from ipython" test in Python?
                            
                                Subtract values in one list from corresponding values in another list
                            
                                Matplotlib: How to make two histograms have the same bin width?
                            
                                python No module named service_identity [duplicate]
                            
                                Django - present current date and time in template
                            
                                Py2exe for Python 3.0
                            
                                Python - Working around memory leaks
                            
                                Efficient and fast Python While loop while using sleep()
                            
                                Performance with global variables vs local
                            
                                New style formatting with tuple as argument
                            
                                How can unrar a file with python
                            
                                react routing and django url conflict
                            
                                HTTP requests.post timeout
                            
                                Standard python interpreter has a vi command mode?
                            
                                Numbers passed as command line arguments in python not interpreted as integers
                            
                                eval to import a module
                            
                                Python Pandas does not read the first row of csv file
                            
                                pandas combine two strings ignore nan values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With