Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seaborn stacked histogram/barplot

I have a pandas.DataFrame and I want to plot a graph based on two columns: Age (int), Survived (int - 0 or 1). Now I have something like this:

enter image description here

This is the code I use:

class DataAnalyzer:

    def _facet_grid(self, func, x: List[str], col: str = None, row: str = None) -> None:
        g = sns.FacetGrid(self.train_data, col=col, row=row)
        if func == sns.barplot:
            g.map(func, *x, ci=None)
        else:
            g.map(func, *x)
        g.add_legend()
        plt.show()

    def analyze(self) -> None:
        # Check if survival rate is connected with Age
        self._facet_grid(plt.hist, col='Survived', x=['Age'])

So this is shown on two subplots. This is good, but its harder to see the difference between the amount of records which have 0 vs 1 in the Survived column, for the particular age range.

So I want to have something like this:

enter image description here

In this scenario you could see this difference. Is there some way to do it on seaborn (cuz there I can easily operate on pandas.DataFrame)? I don't want to use vanilla matplotlib if that's possible

like image 408
dabljues Avatar asked Dec 22 '18 20:12

dabljues


People also ask

What is Histplot in Seaborn?

Data Visualization in Python Using Seaborn Library Histograms represent the data distribution by forming bins along the range of the data and then drawing bars to show the number of observations that fall in each bin.

How do you plot a histogram in Python using Seaborn?

Drawing a simple histogram with default parameters You can add a kde curve to a histogram by setting the kde argument to True. Another way of drawing a histogram with Seaborn is by using the distplot function. In versions before 0.11. 0, it automatically added a kdeplot-like smooth curve.


Video Answer


2 Answers

Starting seaborn 0.11.0, you can do this

# stacked histogram
import matplotlib.pyplot as plt
f = plt.figure(figsize=(7,5))
ax = f.add_subplot(1,1,1)

# mock your data frame
import pandas as pd
import numpy as np
_df = pd.DataFrame({
    "age":np.random.normal(30,30,1000),
    "survived":np.random.randint(0,2,1000)
})

# plot
import seaborn as sns
sns.histplot(data=_df, ax=ax, stat="count", multiple="stack",
             x="age", kde=False,
             palette="pastel", hue="survived",
             element="bars", legend=True)
ax.set_title("Seaborn Stacked Histogram")
ax.set_xlabel("Age")
ax.set_ylabel("Count")

enter image description here

like image 166
Gena Kukartsev Avatar answered Nov 09 '22 14:11

Gena Kukartsev


Just stack the total histogram with the survived -0 one. It's hard to give the exact function without the precise form of the dataframe, but here's a basic example with one of seaborn examples dataset.

import matplotlib.pyplot as plt 
import seaborn as sns 
tips = sns.load_dataset("tips") 
sns.distplot(tips.total_bill, color="gold", kde=False, hist_kws={"alpha": 1}) 
sns.distplot(tips[tips.sex == "Female"].total_bill, color="blue", kde=False, hist_kws={"alpha":1}) 
plt.show()
like image 21
bombadilhom Avatar answered Nov 09 '22 14:11

bombadilhom