Let's say I have the following dataset:
import pandas as pd
import numpy as np
data = ([["Cheese", x] for x in np.random.normal(0.8, 0.03, 10)] +
[["Meat", x] for x in np.random.normal(0.4, 0.05, 14)] +
[["Bread", 0.8], ["Bread", 0.65]])
df = pd.DataFrame(data, columns=["Food", "Score"])
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
sns.set_context("paper")
sns.catplot(x="Score", y="Food", kind="box", data=df)
which yields the following plot (or similar, depending on the generated random numbers):
The reason I am going for box-plots with my actual data is that individual dots combined with the amount of categories I want to show make the plot visually way too noisy and the boxes give a nice general overview of how the data is distributed which is what I am after. However, the issue is with categories like the "Bread" category.
As you can observe, seaborn produced boxes with median, quartiles etc. for all three categories. However, since the category "Bread" does only have two data-points, using a box-plot for this category is not really an appropriate representation. I would much rather have this category only as individual dots.
But when looking at the examples on the https://seaborn.pydata.org/tutorial/categorical.html, the only suggestion for combining box-plots and simple dots is to plot both for all categories which is not what I am after.
In short: How do I plot categorical data with seaborn, selecting the appropriate representation for each category?
Maybe try creating df for bread and not bread:
dfb = df[df['Food'].notnull() & (df['Food'] == 'Bread')]
dfnot_b = df[df['Food'].notnull() & (df['Food'] != 'Bread')]
then add another axis:
fig, ax = plt.subplots()
ax2 = ax.twinx()
try different plots:
sns.boxplot(x="Score", y="Food", data=dfnot_b, ax=ax)
sns.stripplot(x="Score", y="Food", data=dfb, ax=ax2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With