I have the below case statement in python,
pd_df['difficulty'] = 'Unknown' pd_df['difficulty'][(pd_df['Time']<30) & (pd_df['Time']>0)] = 'Easy' pd_df['difficulty'][(pd_df['Time']>=30) & (pd_df['Time']<=60)] = 'Meduim' pd_df['difficulty'][pd_df['Time']>60] = 'Hard'
But when I run the code, it throws an error.
A value is trying to be set on a copy of a slice from a DataFrame
notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.
Option 1
For performance, use a nested np.where
condition. For the condition, you can just use pd.Series.between
, and the default value will be inserted accordingly.
pd_df['difficulty'] = np.where( pd_df['Time'].between(0, 30, inclusive=False), 'Easy', np.where( pd_df['Time'].between(0, 30, inclusive=False), 'Medium', 'Unknown' ) )
Option 2
Similarly, using np.select
, this gives more room for adding conditions:
pd_df['difficulty'] = np.select( [ pd_df['Time'].between(0, 30, inclusive=False), pd_df['Time'].between(30, 60, inclusive=True) ], [ 'Easy', 'Medium' ], default='Unknown' )
Option 3
Another performant solution involves loc
:
pd_df['difficulty'] = 'Unknown' pd_df.loc[pd_df['Time'].between(0, 30, inclusive=False), 'difficulty'] = 'Easy' pd_df.loc[pd_df['Time'].between(30, 60, inclusive=True), 'difficulty'] = 'Medium'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With