Seaborn countplot with normalized y axis per group

Tags:

I was wondering if it is possible to create a Seaborn count plot, but instead of actual counts on the y-axis, show the relative frequency (percentage) within its group (as specified with the hue parameter).

I sort of fixed this with the following approach, but I can't imagine this is the easiest approach:

# Plot percentage of occupation per income class grouped = df.groupby(['income'], sort=False) occupation_counts = grouped['occupation'].value_counts(normalize=True, sort=False)  occupation_data = [     {'occupation': occupation, 'income': income, 'percentage': percentage*100} for      (income, occupation), percentage in dict(occupation_counts).items() ]  df_occupation = pd.DataFrame(occupation_data)  p = sns.barplot(x="occupation", y="percentage", hue="income", data=df_occupation) _ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels

Result:

Percentage plot with seaborn

I'm using the well known adult data set from the UCI machine learning repository. The pandas dataframe is created like this:

# Read the adult dataset df = pd.read_csv(     "data/adult.data",     engine='c',     lineterminator='\n',      names=['age', 'workclass', 'fnlwgt', 'education', 'education_num',            'marital_status', 'occupation', 'relationship', 'race', 'sex',            'capital_gain', 'capital_loss', 'hours_per_week',            'native_country', 'income'],     header=None,     skipinitialspace=True,     na_values="?" )

This question is sort of related, but does not make use of the hue parameter. And in my case I cannot just change the labels on the y-axis, because the height of the bar must depend on the group.

871

asked Jan 05 '16 15:01

Lucas van Dijk

1 Answers

I might be confused. The difference between your output and the output of

occupation_counts = (df.groupby(['income'])['occupation']                      .value_counts(normalize=True)                      .rename('percentage')                      .mul(100)                      .reset_index()                      .sort_values('occupation')) p = sns.barplot(x="occupation", y="percentage", hue="income", data=occupation_counts) _ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels

is, it seems to me, only the order of the columns.

enter image description here

And you seem to care about that, since you pass sort=False. But then, in your code the order is determined uniquely by chance (and the order in which the dictionary is iterated even changes from run to run with Python 3.5).

answered Oct 25 '22 01:10

Pietro Battiston

Related questions
                            
                                What does the ** maths operator do in Python?
                            
                                What is the best way to do automatic attribute assignment in Python, and is it a good idea?
                            
                                Automatically import models on Django shell launch
                            
                                Heroku & Django: "OSError: No such file or directory: '/app/{myappname}/static'"
                            
                                How can I pass parameters to a RequestHandler?
                            
                                How to activate different anaconda environment from powershell
                            
                                How do I set the content-type for POST requests in python-requests library?
                            
                                No module named 'tqdm'
                            
                                Using monotonically_increasing_id() for assigning row number to pyspark dataframe
                            
                                Read a file on App Engine with Python?
                            
                                Use fnmatch.filter to filter files by more than one possible file extension
                            
                                Python: Iterating through a dictionary gives me "int object not iterable"
                            
                                Can Pylint error checking be customized?
                            
                                Beautiful Soup find children for particular div
                            
                                How can I check the existence of attributes and tags in XML before parsing?
                            
                                Unpivot Pandas Data
                            
                                Using openpyxl to read file from memory
                            
                                How to remove parentheses and all data within using Pandas/Python?
                            
                                Calculating the averages for each KEY in a Pairwise (K,V) RDD in Spark with Python
                            
                                Fast calculation of Pareto front in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Seaborn countplot with normalized y axis per group

Tags:

python

pandas

seaborn

Lucas van Dijk

People also ask

1 Answers

Pietro Battiston

Recent Activity

Donate For Us