Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seaborn countplot with normalized y axis per group

I was wondering if it is possible to create a Seaborn count plot, but instead of actual counts on the y-axis, show the relative frequency (percentage) within its group (as specified with the hue parameter).

I sort of fixed this with the following approach, but I can't imagine this is the easiest approach:

# Plot percentage of occupation per income class grouped = df.groupby(['income'], sort=False) occupation_counts = grouped['occupation'].value_counts(normalize=True, sort=False)  occupation_data = [     {'occupation': occupation, 'income': income, 'percentage': percentage*100} for      (income, occupation), percentage in dict(occupation_counts).items() ]  df_occupation = pd.DataFrame(occupation_data)  p = sns.barplot(x="occupation", y="percentage", hue="income", data=df_occupation) _ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels 

Result:

Percentage plot with seaborn

I'm using the well known adult data set from the UCI machine learning repository. The pandas dataframe is created like this:

# Read the adult dataset df = pd.read_csv(     "data/adult.data",     engine='c',     lineterminator='\n',      names=['age', 'workclass', 'fnlwgt', 'education', 'education_num',            'marital_status', 'occupation', 'relationship', 'race', 'sex',            'capital_gain', 'capital_loss', 'hours_per_week',            'native_country', 'income'],     header=None,     skipinitialspace=True,     na_values="?" ) 

This question is sort of related, but does not make use of the hue parameter. And in my case I cannot just change the labels on the y-axis, because the height of the bar must depend on the group.

like image 871
Lucas van Dijk Avatar asked Jan 05 '16 15:01

Lucas van Dijk


People also ask

What does SNS Countplot () do?

When you use sns. countplot , Seaborn literally counts the number of observations per category for a categorical variable, and displays the results as a bar chart.

What is hue in Seaborn Countplot?

hue : (optional) This parameter take column name for colour encoding. data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form.


1 Answers

I might be confused. The difference between your output and the output of

occupation_counts = (df.groupby(['income'])['occupation']                      .value_counts(normalize=True)                      .rename('percentage')                      .mul(100)                      .reset_index()                      .sort_values('occupation')) p = sns.barplot(x="occupation", y="percentage", hue="income", data=occupation_counts) _ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels 

is, it seems to me, only the order of the columns.

enter image description here

And you seem to care about that, since you pass sort=False. But then, in your code the order is determined uniquely by chance (and the order in which the dictionary is iterated even changes from run to run with Python 3.5).

like image 90
Pietro Battiston Avatar answered Oct 25 '22 01:10

Pietro Battiston