Lets say we agree on the following order in terms of hierarchy.
Baby --> Child --> Teenager --> Adult
I have this data set
Name Stage Highest_Stage_Reached
0 Adam Child
1 Barry Child
2 Ben Adult
3 Adam Teenager
4 Barry Adult
5 Ben Baby
How would I have the data set to populate the Highest_Stage_Reached field like this?
Name Stage Highest_Stage_Reached
0 Adam Child Teenager
1 Barry Child Adult
2 Ben Adult Adult
3 Adam Teenager Teenager
4 Barry Adult Adult
5 Ben Baby Adult
You can use:
d={'Baby':0,'Child':1,'Teenager':2,'Adult':3}
df['rank']=df.Stage.map(d)
df['Highest_Stage_Reached']=df.groupby('Name')['rank'].transform('max').\
map({v: k for k, v in d.items()})
print(df.drop('rank',1))
Name Stage Highest_Stage_Reached
0 Adam Child Teenager
1 Barry Child Adult
2 Ben Adult Adult
3 Adam Teenager Teenager
4 Barry Adult Adult
5 Ben Baby Adult
Convert the column to categorical, with order parameter. Now it lets you sort. This will also support variable number of arguments in Stage.
df['Stage'] = pd.Categorical(df['Stage'], ordered=True, categories=['Baby', 'Child','Teenager','Adult'])
df['Highest_Stage_Reached'] = df.groupby('Name').Stage.transform('max')
Name Stage Highest_Stage_Reached
0 Adam Child Teenager
1 Barry Child Adult
2 Ben Adult Adult
3 Adam Teenager Teenager
4 Barry Adult Adult
5 Ben Baby Adult
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With