Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to display percentage above grouped bar chart

The following are the pandas dataframe and the bar chart generated from it:

colors_list = ['#5cb85c','#5bc0de','#d9534f']
result.plot(kind='bar',figsize=(15,4),width = 0.8,color = colors_list,edgecolor=None)
plt.legend(labels=result.columns,fontsize= 14)
plt.title("Percentage of Respondents' Interest in Data Science Areas",fontsize= 16)

plt.xticks(fontsize=14)
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.yticks([])

DataframeBar Chart

I need to display the percentages of each interest category for the respective subject above their corresponding bar. I can create a list with the percentages, but I don't understand how to add it on top of the corresponding bar.

like image 955
raniphore Avatar asked Aug 29 '18 15:08

raniphore


2 Answers

Try adding the following for loop to your code:

ax = result.plot(kind='bar', figsize=(15,4), width=0.8, color=colors_list, edgecolor=None)

for p in ax.patches:
    width = p.get_width()
    height = p.get_height()
    x, y = p.get_xy() 
    ax.annotate(f'{height}', (x + width/2, y + height*1.02), ha='center')

Explanation

In general, you use Axes.annotate to add annotations to your plots.
This method takes the text value of the annotation and the xy coords on which to place the annotation.

In a barplot, each "bar" is represented by a patch.Rectangle and each of these rectangles has the attributes width, height and the xy coords of the lower left corner of the rectangle, all of which can be obtained with the methods patch.get_width, patch.get_height and patch.get_xy respectively.

Putting this all together, the solution is to loop through each patch in your Axes, and set the annotation text to be the height of that patch, with an appropriate xy position that's just above the centre of the patch - calculated from it's height, width and xy coords.


For your specific need to annotate with the percentages, I would first normalize your DataFrame and plot that instead.

colors_list = ['#5cb85c','#5bc0de','#d9534f']

# Normalize result
result_pct = result.div(result.sum(1), axis=0)

ax = result_pct.plot(kind='bar',figsize=(15,4),width = 0.8,color = colors_list,edgecolor=None)
plt.legend(labels=result.columns,fontsize= 14)
plt.title("Percentage of Respondents' Interest in Data Science Areas",fontsize= 16)

plt.xticks(fontsize=14)
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.yticks([])

# Add this loop to add the annotations
for p in ax.patches:
    width = p.get_width()
    height = p.get_height()
    x, y = p.get_xy() 
    ax.annotate(f'{height:.0%}', (x + width/2, y + height*1.02), ha='center')

plot

like image 61
Chris Adams Avatar answered Oct 22 '22 03:10

Chris Adams


  • From matplotlib 3.4.2, use matplotlib.pyplot.bar_label
  • Modified from this answer, which has a different calculation, and a different label format.
  • Plot the DataFrame directly with pandas.DataFrame.plot and kind='bar'
  • See this answer for more documentation and examples using the .bar_label method.
  • Adjust which axis is used in .div and .sum to determine the percent relative to the columns.
import pandas as pd

file="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/coursera/Topic_Survey_Assignment.csv"
df=pd.read_csv(file, index_col=0)

df.sort_values(by=['Very interested'], axis=0, ascending=False, inplace=True)

# calculate the percent relative to the index
df_percent = df.div(df.sum(axis=1), axis=0).mul(100).round(1)

# display(df_percent)
                            Very interested  Somewhat interested  Not interested
Data Analysis / Statistics             77.0                 20.3             2.7
Machine Learning                       74.7                 21.9             3.4
Data Visualization                     61.6                 33.7             4.7
Big Data (Spark / Hadoop)              60.9                 33.3             5.8
Deep Learning                          58.2                 35.5             6.3
Data Journalism                        20.2                 51.0            28.8

# set the colors
colors = ['#5cb85c', '#5bc0de', '#d9534f']

# plot with annotations is probably easier
p1 = df_percent.plot(kind='bar', color=colors, figsize=(20, 8), rot=0, ylabel='Percentage', title="The percentage of the respondents' interest in the different data science Area")

for p in p1.containers:
    p1.bar_label(p, fmt='%.1f%%', label_type='edge')

enter image description here

like image 38
Trenton McKinney Avatar answered Oct 22 '22 02:10

Trenton McKinney