I have a simple Data Frame that stores the results of a survey. The columns are:
| Age | Income | Satisfaction |
all of them contains values between 1
and 5
(categorical). I managed to generate a stacked barplot that shows distribution of Satisfaction
values across people of different age.
The code is:
#create a random df
data = []
for i in range(500):
sample = {"age" : random.randint(0,5), "income" : random.randint(1,5), "satisfaction" : random.randint(1,5)}
data.append(sample)
df = pd.DataFrame(data)
#group by age
counter = df.groupby('age')['satisfaction'].value_counts().unstack()
#calculate the % for each age group
percentage_dist = 100 * counter.divide(counter.sum(axis = 1), axis = 0)
percentage_dist.plot.bar(stacked=True)
This generates the following, desired, plot:
However, it's difficult to compare if the green
subset (percentage) of Age-0
is higher than the one in Age-2
. Therefore, is there a way of adding the percentage on top of each sub-section of the barplot. Something like this, but for every single bar:
One option is to iterate over the patches in order to obtain their width, height and bottom-left coordinates and use this values to place the label at the center of the corresponding bar.
To do this, the axes returned by the pandas bar method must be stored.
ax = percentage_dist.plot.bar(stacked=True)
for p in ax.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax.text(x+width/2,
y+height/2,
'{:.0f} %'.format(height),
horizontalalignment='center',
verticalalignment='center')
Here, the annotated value is set to 0 decimals, but this can be easily modified.
The output plot generated with this code is the following:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With