Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Case when & default in pandas

Tags:

python

pandas

I have the below case statement in python,

pd_df['difficulty'] = 'Unknown' pd_df['difficulty'][(pd_df['Time']<30) & (pd_df['Time']>0)] = 'Easy' pd_df['difficulty'][(pd_df['Time']>=30) & (pd_df['Time']<=60)] = 'Meduim' pd_df['difficulty'][pd_df['Time']>60] = 'Hard' 

But when I run the code, it throws an error.

A value is trying to be set on a copy of a slice from a DataFrame 
like image 652
Tom J Muthirenthi Avatar asked Mar 12 '18 05:03

Tom J Muthirenthi


People also ask

IS NOT NULL in pandas?

notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.


1 Answers

Option 1
For performance, use a nested np.where condition. For the condition, you can just use pd.Series.between, and the default value will be inserted accordingly.

pd_df['difficulty'] = np.where(      pd_df['Time'].between(0, 30, inclusive=False),      'Easy',       np.where(         pd_df['Time'].between(0, 30, inclusive=False), 'Medium', 'Unknown'      ) ) 

Option 2
Similarly, using np.select, this gives more room for adding conditions:

pd_df['difficulty'] = np.select(     [         pd_df['Time'].between(0, 30, inclusive=False),          pd_df['Time'].between(30, 60, inclusive=True)     ],      [         'Easy',          'Medium'     ],      default='Unknown' ) 

Option 3
Another performant solution involves loc:

pd_df['difficulty'] = 'Unknown' pd_df.loc[pd_df['Time'].between(0, 30, inclusive=False), 'difficulty'] = 'Easy' pd_df.loc[pd_df['Time'].between(30, 60, inclusive=True), 'difficulty'] = 'Medium' 
like image 150
cs95 Avatar answered Sep 24 '22 08:09

cs95