Creating a matplotlib or seaborn histogram which uses percent rather than count?

Tags:

Specifically I'm dealing with the Kaggle Titanic dataset. I've plotted a stacked histogram which shows ages that survived and died upon the titanic. Code below.

figure = plt.figure(figsize=(15,8))
plt.hist([data[data['Survived']==1]['Age'], data[data['Survived']==0]['Age']], stacked=True, bins=30, label=['Survived','Dead'])
plt.xlabel('Age')
plt.ylabel('Number of passengers')
plt.legend()

I would like to alter the chart to show a single chart per bin of the percentage in that age group that survived. E.g. if a bin contained the ages between 10-20 years of age and 60% of people aboard the titanic in that age group survived, then the height would line up 60% along the y-axis.

Edit: I may have given a poor explanation to what I'm looking for. Rather than alter the y-axis values, I'm looking to change the actual shape of the bars based on the percentage that survived.

The first bin on the graph shows roughly 65% survived in that age group. I would like this bin to line up against the y-axis at 65%. The following bins look to be 90%, 50%, 10% respectively, and so on.

The graph would end up actually looking something like this:

enter image description here

931

asked Oct 17 '16 17:10

WillacyMe

2 Answers

Perhaps the following will help ...

Split the dataframe based on 'Survived'

df_survived=df[df['Survived']==1]
df_not_survive=df[df['Survived']==0]

Create Bins
```
age_bins=np.linspace(0,80,21)
```

Use np.histogram to generate histogram data

survived_hist=np.histogram(df_survived['Age'],bins=age_bins,range=(0,80))
not_survive_hist=np.histogram(df_not_survive['Age'],bins=age_bins,range=(0,80))

Calculate survival rate in each bin

surv_rates=survived_hist[0]/(survived_hist[0]+not_survive_hist[0])

Plot

plt.bar(age_bins[:-1],surv_rates,width=age_bins[1]-age_bins[0])
plt.xlabel('Age')
plt.ylabel('Survival Rate')

enter image description here

113

answered Oct 16 '22 19:10

bahaugen

For Seaborn, use the parameter stat. According to the documentation, currently supported values for the stat parameter are:

count shows the number of observations
frequency shows the number of observations divided by the bin width
density normalizes counts so that the area of the histogram is 1
probability normalizes counts so that the sum of the bar heights is 1

The result with stat being count:

seaborn.histplot(
    data=data,
    x='variable',
    discrete=True,
    stat='count'
)

Histogram result for stat=count

The result after stat is changed to probability:

seaborn.histplot(
    data=data,
    x='variable',
    discrete=True,
    stat='probability'
)

Histogram result for stat=probability

answered Oct 16 '22 19:10

miro

Related questions
                            
                                How to get a complex number as a user input in python?
                            
                                Multiplying along an arbitrary axis?
                            
                                simple SNTP python script
                            
                                jinja2 iterate through list of tuples
                            
                                Django - Get data from a form that is not a django.form
                            
                                ImportError: No module named cv2.cv
                            
                                Python selenium - modifying the source code of a webpage
                            
                                Python ImageIO Gif Set Delay Between Frames
                            
                                How can I correctly format dates using openpyxl?
                            
                                Execute entire DAG using Airflow UI
                            
                                Subquery with count in SQLAlchemy
                            
                                How should I escape ldap special characters?
                            
                                Dealing with missing data in Pandas read_csv
                            
                                Flask Testing - How to retrieve variables that were passed to Jinja?
                            
                                Convert decision tree directly to png [duplicate]
                            
                                Django: how to add compare condition in annotate queryset
                            
                                Django downgrade from 1.9 to 1.8
                            
                                Python Logging for a module shared by different scripts
                            
                                Setting Series as index
                            
                                Replace values in pandas Series with dictionary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a matplotlib or seaborn histogram which uses percent rather than count?

Tags:

python

pandas

matplotlib

dataset

histogram

WillacyMe

People also ask

2 Answers

bahaugen

miro

Recent Activity

Donate For Us