Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating New Column based on condition on Other Column in Pandas DataFrame

I have this dataframe:

+------+--------------+------------+
| ID   | Education    |      Score | 
+------+--------------+------------+
|    1 |  High School |      7.884 |     
|    2 |  Bachelors   |      6.952 |     
|    3 |  High School |      8.185 |   
|    4 |  High School |      6.556 | 
|    5 |  Bachelors   |      6.347 | 
|    6 |  Master      |      6.794 |   
+------+--------------+------------+

I want to create a new column which is categorizing the score column. I want to label it as: 'bad', 'good', 'very good'.

Which maybe would look like this:

+------+--------------+------------+------------+
| ID   | Education    |      Score | Labels     |
+------+--------------+------------+------------+
|    1 |  High School |      7.884 | Good       |
|    2 |  Bachelors   |      6.952 | Bad        |
|    3 |  High School |      8.185 | Very good  |   
|    4 |  High School |      6.556 | Bad        |
|    5 |  Bachelors   |      6.347 | Bad        |
|    6 |  Master      |      6.794 | Bad        |
+------+--------------+------------+------------+

How can I do that?

Thanks in advance

like image 709
ebuzz168 Avatar asked Dec 18 '22 14:12

ebuzz168


2 Answers

import pandas as pd 

# initialize list of lists 
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score']) 

df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df

    ID  Education    Score    Labels
0   1   High School  7.884    Good
1   2   Bachelors    6.952    Bad
2   3   High School  8.185    Very Good
3   4   High School  6.556    Bad
4   5   Bachelors    6.347    Bad
5   6   Master       6.794    Bad
like image 163
Prathik Kini Avatar answered Jan 04 '23 22:01

Prathik Kini


I suppose it is the score you would like to map to the labels. You could define a mapping function taking score as input and then returning the label:

def map_score(score):
  if score >= 8:
    return "Very good"
  elif score >= 7:
    return "Good"
  else:
    return "Bad"

df["Labels"] = df["Score"].apply(lambda score: map_score(score))
like image 34
Asgeer Avatar answered Jan 04 '23 22:01

Asgeer