Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib PyPlot Stacked histograms - stacking different attributes in each bar

I have the following code to draw some histograms about subjects in a database:

import matplotlib.pyplot as plt

attr_info = {
    'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
    'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
    'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}
bin_info = {key: None for key in attr_info}
bin_info['Age'] = 10

for name, a_info in attr_info.items():
    plt.figure(num=name)
    counts, bins, _ = plt.hist(a_info, bins=bin_info[name], color='blue', edgecolor='black')

    plt.margins(0)
    plt.title(name)
    plt.xlabel(name)
    plt.ylabel("# Subjects")
    plt.yticks(range(0, 11, 2))
    plt.grid(axis='y')
    plt.tight_layout(pad=0)

    plt.show()

This code works but it draws each attribute's distribution in a separate histogram. What I'd like to achieve is something like this:

Stacked histogram

I'm aware plt.hist has a stacked parameter, but that seems to be intended for a slightly different use, where you're stacking the same attributes on each other at different subject types. You could for example draw a histogram where each whole bar would represent some age range and the bar itself would be a stack of smokers in one colour and non-smokers in another.

I haven't been able to figure out how to use it to stack (and properly label as in the image) different attributes on top of each other in each bar.

like image 444
Mate de Vita Avatar asked Jun 02 '26 00:06

Mate de Vita


2 Answers

You need to play around with your data a bit, but this can be done without pandas. Also, what you want are stacked bar plots, not histograms:

import matplotlib.pyplot as plt

attr_info = {
'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}

# Filter your data for each bar section that you want
ages_0_10 = [x for x in attr_info['Age'] if x < 10]
ages_10_40 = [x for x in attr_info['Age'] if x >= 10 and x < 40]
ages_40p = [x for x in attr_info['Age'] if x > 40]

gender_m = [x for x in attr_info['Gender'] if 'm' in x]
gender_f = [x for x in attr_info['Gender'] if 'f' in x]

smoker_y = [x for x in attr_info['Smoker'] if 'y' in x]
smoker_n = [x for x in attr_info['Smoker'] if 'n' in x]

# Locations for each bin (you can move them around)
locs = [0, 1, 2]

# I'm going to plot the Ages bin separate than the Smokers and Gender ones, 
# since Age has 3 stacked bars and the other have just 2 each
plt.bar(locs[0], len(ages_0_10), width=0.5)  # This is the bottom bar

# Second stacked bar, note the bottom variable assigned to the previous bar
plt.bar(locs[0], len(ages_10_40), bottom=len(ages_0_10), width=0.5) 

# Same as before but now bottom is the 2 previous bars    
plt.bar(locs[0], len(ages_40p), bottom=len(ages_0_10) + len(ages_10_40), width=0.5)

# Add labels, play around with the locations
#plt.text(x, y, text)
plt.text(locs[0], len(ages_0_10) / 2, r'$<10$')
plt.text(locs[0], len(ages_0_10) + 1, r'$[10, 40]$')
plt.text(locs[0], len(ages_0_10) + 5, r'$>40$')


# Define the top bars and bottom bars for the Gender and Smokers stack
# In both cases is just 2 stacked bars,
# so we can use a list for this instead of doing it separate as for Age
tops = [len(gender_m), len(smoker_y)]
bottoms = [len(gender_f), len(smoker_n)]

plt.bar(locs[1:], bottoms, width=0.5)
plt.bar(locs[1:], tops, bottom=bottoms, width=0.5)

# Labels again
# Gender
plt.text(locs[1], len(gender_m) / 2, 'm')
plt.text(locs[1], len(gender_m) + 2, 'f')

# Smokers
plt.text(locs[2], len(smoker_y) / 2, 'y')
plt.text(locs[2], len(smoker_n) + 2, 'n')

# Set tick labels
plt.xticks(locs, ('Age', 'Gender', 'Smoker'))
plt.show()

Result: enter image description here

Check the documentation for pyplot.bar and this example.

like image 110
Francisca Concha-Ramírez Avatar answered Jun 05 '26 01:06

Francisca Concha-Ramírez


How about trying out pandas:

import pandas as pd

attr_info = {
    'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
    'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
    'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}

df =  pd.DataFrame(attr_info)

bins = [0,32,45,300] #bins can be adjusted to your liking

#deselect "Age" and select all remaining columns
counts = df.filter(regex="[^Age]").apply(pd.Series.value_counts) 
#bin age data and count
age_data = df.groupby(pd.cut(df['Age'], bins=bins))["Age"].count()

fig, ax = plt.subplots()
pd.concat([counts,age_data]).rename(columns={0:"Age"}).T.plot(kind="bar", stacked=True, ax=ax)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

Output:

enter image description here

The advantage of this approach is its generality, no matter how many columns you want to plot.

like image 31
Fourier Avatar answered Jun 05 '26 00:06

Fourier



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!