I have a dataframe with several columns, where every column has between 5 and 2535 entries (the rest is NAN). I want to plot a boxplot when the column has more than 9 numeric entries and a swarmplot otherwise. I used my mad paint skills to create an example.
The problem is that I am only able to plot both as overlays, as in this example. I tried using the position keyword, but this only works for the boxplot, not for the swarmplot. So, how can this be done?
An example dataset can be produced like this:
np.random.seed(1)
df = pd.DataFrame(np.nan, index=range(100), columns=range(11))
for i, column in enumerate(df.columns):
if i % 2 == 0:
fill_till = np.random.randint(1,11)
df.loc[:fill_till-1,column] = np.random.random(fill_till)
else:
fill_till = np.random.randint(11,101)
df.loc[:fill_till-1,column] = np.random.random(fill_till)
You can create two copies of the data frame, one for the box plot and one for the swarm plot. Then, in each copy, set the values in the columns you don't want to plot in that way to nan
.
col_mask = df.count() > 9
swarm_data = df.copy()
swarm_data.loc[:, col_mask] = np.nan
box_data = df.copy()
box_data.loc[:, ~col_mask] = np.nan
Then pass each of the copied data frames to the appropriate seaborn function.
sns.swarmplot(data=swarm_data)
sns.boxplot(data=box_data)
plt.show()
When creating the swarm plot seaborn will plot nothing for the columns filled with nan
, but will leave space where they would be. The reverse will happen with the box plot, resulting in your column order being preserved.
The chart generated by the above code looks like this:
This approach would also work for columns with none-numeric labels:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With