I have a pandas.DataFrame
and I want to plot a graph based on two columns: Age
(int), Survived
(int - 0
or 1
). Now I have something like this:
This is the code I use:
class DataAnalyzer:
def _facet_grid(self, func, x: List[str], col: str = None, row: str = None) -> None:
g = sns.FacetGrid(self.train_data, col=col, row=row)
if func == sns.barplot:
g.map(func, *x, ci=None)
else:
g.map(func, *x)
g.add_legend()
plt.show()
def analyze(self) -> None:
# Check if survival rate is connected with Age
self._facet_grid(plt.hist, col='Survived', x=['Age'])
So this is shown on two subplots. This is good, but its harder to see the difference between the amount of records which have 0
vs 1
in the Survived
column, for the particular age range.
So I want to have something like this:
In this scenario you could see this difference. Is there some way to do it on seaborn
(cuz there I can easily operate on pandas.DataFrame
)? I don't want to use vanilla matplotlib
if that's possible
Data Visualization in Python Using Seaborn Library Histograms represent the data distribution by forming bins along the range of the data and then drawing bars to show the number of observations that fall in each bin.
Drawing a simple histogram with default parameters You can add a kde curve to a histogram by setting the kde argument to True. Another way of drawing a histogram with Seaborn is by using the distplot function. In versions before 0.11. 0, it automatically added a kdeplot-like smooth curve.
Starting seaborn 0.11.0, you can do this
# stacked histogram
import matplotlib.pyplot as plt
f = plt.figure(figsize=(7,5))
ax = f.add_subplot(1,1,1)
# mock your data frame
import pandas as pd
import numpy as np
_df = pd.DataFrame({
"age":np.random.normal(30,30,1000),
"survived":np.random.randint(0,2,1000)
})
# plot
import seaborn as sns
sns.histplot(data=_df, ax=ax, stat="count", multiple="stack",
x="age", kde=False,
palette="pastel", hue="survived",
element="bars", legend=True)
ax.set_title("Seaborn Stacked Histogram")
ax.set_xlabel("Age")
ax.set_ylabel("Count")
Just stack the total histogram with the survived -0 one. It's hard to give the exact function without the precise form of the dataframe, but here's a basic example with one of seaborn examples dataset.
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset("tips")
sns.distplot(tips.total_bill, color="gold", kde=False, hist_kws={"alpha": 1})
sns.distplot(tips[tips.sex == "Female"].total_bill, color="blue", kde=False, hist_kws={"alpha":1})
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With