The following are the pandas dataframe and the bar chart generated from it:
colors_list = ['#5cb85c','#5bc0de','#d9534f']
result.plot(kind='bar',figsize=(15,4),width = 0.8,color = colors_list,edgecolor=None)
plt.legend(labels=result.columns,fontsize= 14)
plt.title("Percentage of Respondents' Interest in Data Science Areas",fontsize= 16)
plt.xticks(fontsize=14)
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.yticks([])
I need to display the percentages of each interest category for the respective subject above their corresponding bar. I can create a list with the percentages, but I don't understand how to add it on top of the corresponding bar.
Try adding the following for
loop to your code:
ax = result.plot(kind='bar', figsize=(15,4), width=0.8, color=colors_list, edgecolor=None)
for p in ax.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
ax.annotate(f'{height}', (x + width/2, y + height*1.02), ha='center')
In general, you use Axes.annotate
to add annotations to your plots.
This method takes the text
value of the annotation and the xy
coords on which to place the annotation.
In a barplot, each "bar" is represented by a patch.Rectangle
and each of these rectangles has the attributes width
, height
and the xy
coords of the lower left corner of the rectangle, all of which can be obtained with the methods patch.get_width
, patch.get_height
and patch.get_xy
respectively.
Putting this all together, the solution is to loop through each patch in your Axes
, and set the annotation text to be the height
of that patch, with an appropriate xy
position that's just above the centre of the patch - calculated from it's height, width and xy coords.
For your specific need to annotate with the percentages, I would first normalize your DataFrame
and plot that instead.
colors_list = ['#5cb85c','#5bc0de','#d9534f']
# Normalize result
result_pct = result.div(result.sum(1), axis=0)
ax = result_pct.plot(kind='bar',figsize=(15,4),width = 0.8,color = colors_list,edgecolor=None)
plt.legend(labels=result.columns,fontsize= 14)
plt.title("Percentage of Respondents' Interest in Data Science Areas",fontsize= 16)
plt.xticks(fontsize=14)
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.yticks([])
# Add this loop to add the annotations
for p in ax.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
ax.annotate(f'{height:.0%}', (x + width/2, y + height*1.02), ha='center')
matplotlib 3.4.2
, use matplotlib.pyplot.bar_label
pandas.DataFrame.plot
and kind='bar'
.bar_label
method..div
and .sum
to determine the percent relative to the columns.import pandas as pd
file="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/coursera/Topic_Survey_Assignment.csv"
df=pd.read_csv(file, index_col=0)
df.sort_values(by=['Very interested'], axis=0, ascending=False, inplace=True)
# calculate the percent relative to the index
df_percent = df.div(df.sum(axis=1), axis=0).mul(100).round(1)
# display(df_percent)
Very interested Somewhat interested Not interested
Data Analysis / Statistics 77.0 20.3 2.7
Machine Learning 74.7 21.9 3.4
Data Visualization 61.6 33.7 4.7
Big Data (Spark / Hadoop) 60.9 33.3 5.8
Deep Learning 58.2 35.5 6.3
Data Journalism 20.2 51.0 28.8
# set the colors
colors = ['#5cb85c', '#5bc0de', '#d9534f']
# plot with annotations is probably easier
p1 = df_percent.plot(kind='bar', color=colors, figsize=(20, 8), rot=0, ylabel='Percentage', title="The percentage of the respondents' interest in the different data science Area")
for p in p1.containers:
p1.bar_label(p, fmt='%.1f%%', label_type='edge')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With