Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot categorical data with seaborn setting the plot-style for each data column?

Background

Let's say I have the following dataset:

import pandas as pd
import numpy as np

data = ([["Cheese", x] for x in np.random.normal(0.8, 0.03, 10)] + 
        [["Meat", x] for x in np.random.normal(0.4, 0.05, 14)] + 
        [["Bread", 0.8], ["Bread", 0.65]])

df = pd.DataFrame(data, columns=["Food", "Score"])


import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
sns.set_context("paper")
sns.catplot(x="Score", y="Food", kind="box", data=df)

which yields the following plot (or similar, depending on the generated random numbers):

Sample box plot

The reason I am going for box-plots with my actual data is that individual dots combined with the amount of categories I want to show make the plot visually way too noisy and the boxes give a nice general overview of how the data is distributed which is what I am after. However, the issue is with categories like the "Bread" category.

Question

As you can observe, seaborn produced boxes with median, quartiles etc. for all three categories. However, since the category "Bread" does only have two data-points, using a box-plot for this category is not really an appropriate representation. I would much rather have this category only as individual dots.

But when looking at the examples on the https://seaborn.pydata.org/tutorial/categorical.html, the only suggestion for combining box-plots and simple dots is to plot both for all categories which is not what I am after.

In short: How do I plot categorical data with seaborn, selecting the appropriate representation for each category?

like image 475
NOhs Avatar asked Nov 06 '22 09:11

NOhs


1 Answers

Maybe try creating df for bread and not bread:

dfb = df[df['Food'].notnull() & (df['Food'] == 'Bread')]
dfnot_b = df[df['Food'].notnull() & (df['Food'] != 'Bread')]

then add another axis:

fig, ax = plt.subplots()
ax2 = ax.twinx()

try different plots:

sns.boxplot(x="Score", y="Food", data=dfnot_b, ax=ax)
sns.stripplot(x="Score", y="Food", data=dfb, ax=ax2)

plot overlay

like image 111
GLarose Avatar answered Nov 12 '22 15:11

GLarose