I'm using seaborn to make a violinplot, which uses hues to identify who survived and who didn't. This is given by the column 'DEATH_EVENT', where 0 means the person survived and 1 means they didn't. The only issue I'm having is that I can't figure out how to set labels for this hue legend. As seen below, 'DEATH_EVENT' presents 0 and 1, but I want to change this into 'Survived' and 'Not survived'.

Current code:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
sns.set()
plt.style.use('seaborn')
data = pd.read_csv('heart_failure_clinical_records_dataset.csv')
g = sns.violinplot(data=data, x='smoking', y='age', hue='DEATH_EVENT')
g.set_xticklabels(['No smoking', 'Smoking'])
I tried to use: g.legend(labels=['Survived', 'Not survived']), but it returns it without the colors, instead a thin and thick line for some reason.

I'm aware I could just use:
data['DEATH_EVENT'].replace({0:'Survived', 1:'Not survived'}, inplace=True)
but I wanted to see if there was another way. I'm still a rookie, so I'm guessing that there's a reason why the CSV's author made it so that it uses integers to describe plenty of things. Ex: if someone smokes or not, sex, diabetic or not, etc. Maybe it runs faster?
Controlling Seaborn legends is still somewhat tricky (some extensions to matplotlib's API would be helpful). In this case, you could grab the handles from the just-created legend and reuse them for a new legend:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.DataFrame({"smoking": np.random.randint(0, 2, 200),
                     "survived": np.random.randint(0, 2, 200),
                     "age": np.random.normal(60, 10, 200),
                     "DEATH_EVENT": np.random.randint(0, 2, 200)})
ax = sns.violinplot(data=data, x='smoking', y='age', hue='DEATH_EVENT')
ax.set_xticklabels(['No smoking', 'Smoking'])
ax.legend(handles=ax.legend_.legendHandles, labels=['Survived', 'Not survived'])

Here is an approach to make the change via the dataframe without changing the original dataframe. To avoid accessing ax.legend_ alltogether (to remove the legend title), a trick is to rename the column to a blank string (and use that blank string for hue). If the dataframe isn't super long (i.e. not having millions of rows), the speed and memory overhead are quite modest.
names = {0: 'Survived', 1: 'Not survived'}
ax = sns.violinplot(data=data.replace({'DEATH_EVENT': names}).rename(columns={'DEATH_EVENT': ''}),
                    x='smoking', y='age', hue='')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With