Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to have clusters of stacked bars with python (Pandas)

So here is how my data set looks like :

In [1]: df1=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])  In [2]: df2=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])  In [3]: df1 Out[3]:            I         J A  0.675616  0.177597 B  0.675693  0.598682 C  0.631376  0.598966 D  0.229858  0.378817  In [4]: df2 Out[4]:            I         J A  0.939620  0.984616 B  0.314818  0.456252 C  0.630907  0.656341 D  0.020994  0.538303 

I want to have stacked bar plot for each dataframe but since they have same index, I'd like to have 2 stacked bars per index.

I've tried to plot both on the same axes :

In [5]: ax = df1.plot(kind="bar", stacked=True)  In [5]: ax2 = df2.plot(kind="bar", stacked=True, ax = ax) 

But it overlaps.

Then I tried to concat the two dataset first :

pd.concat(dict(df1 = df1, df2 = df2),axis = 1).plot(kind="bar", stacked=True) 

but here everything is stacked

My best try is :

 pd.concat(dict(df1 = df1, df2 = df2),axis = 0).plot(kind="bar", stacked=True) 

Which gives :

enter image description here

This is basically what I want, except that I want the bar ordered as

(df1,A) (df2,A) (df1,B) (df2,B) etc...

I guess there is a trick but I can't found it !


After @bgschiller's answer I got this :

enter image description here

Which is almost what I want. I would like the bar to be clustered by index, in order to have something visually clear.

Bonus : Having the x-label not redundant, something like :

df1 df2    df1 df2 _______    _______ ...    A          B 

Thanks for helping.

like image 850
jrjc Avatar asked Apr 01 '14 13:04

jrjc


People also ask

Is it possible to create multiple bar graphs in Python?

By using plt. subplot() method we create two subplots side by side. plt. bar() method is used to create multiple bar chart graphs.

How do I show values in a stacked bar chart in Matplotlib?

DataFrame. plot(kind='bar', stacked=True) , is the easiest way to plot a stacked bar plot. This method returns a matplotlib.


1 Answers

I eventually found a trick (edit: see below for using seaborn and longform dataframe):

Solution with pandas and matplotlib

Here it is with a more complete example :

import pandas as pd import matplotlib.cm as cm import numpy as np import matplotlib.pyplot as plt  def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/", **kwargs):     """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot.  labels is a list of the names of the dataframe, used for the legend title is a string for the title of the plot H is the hatch used for identification of the different dataframe"""      n_df = len(dfall)     n_col = len(dfall[0].columns)      n_ind = len(dfall[0].index)     axe = plt.subplot(111)      for df in dfall : # for each data frame         axe = df.plot(kind="bar",                       linewidth=0,                       stacked=True,                       ax=axe,                       legend=False,                       grid=False,                       **kwargs)  # make bar plots      h,l = axe.get_legend_handles_labels() # get the handles we want to modify     for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df         for j, pa in enumerate(h[i:i+n_col]):             for rect in pa.patches: # for each index                 rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))                 rect.set_hatch(H * int(i / n_col)) #edited part                      rect.set_width(1 / float(n_df + 1))      axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)     axe.set_xticklabels(df.index, rotation = 0)     axe.set_title(title)      # Add invisible data to add another legend     n=[]             for i in range(n_df):         n.append(axe.bar(0, 0, color="gray", hatch=H * i))      l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])     if labels is not None:         l2 = plt.legend(n, labels, loc=[1.01, 0.1])      axe.add_artist(l1)     return axe  # create fake dataframes df1 = pd.DataFrame(np.random.rand(4, 5),                    index=["A", "B", "C", "D"],                    columns=["I", "J", "K", "L", "M"]) df2 = pd.DataFrame(np.random.rand(4, 5),                    index=["A", "B", "C", "D"],                    columns=["I", "J", "K", "L", "M"]) df3 = pd.DataFrame(np.random.rand(4, 5),                    index=["A", "B", "C", "D"],                     columns=["I", "J", "K", "L", "M"])  # Then, just call : plot_clustered_stacked([df1, df2, df3],["df1", "df2", "df3"])      

And it gives that :

multiple stacked bar plot

You can change the colors of the bar by passing a cmap argument:

plot_clustered_stacked([df1, df2, df3],                        ["df1", "df2", "df3"],                        cmap=plt.cm.viridis) 

Solution with seaborn:

Given the same df1, df2, df3, below, I convert them in a long form:

df1["Name"] = "df1" df2["Name"] = "df2" df3["Name"] = "df3" dfall = pd.concat([pd.melt(i.reset_index(),                            id_vars=["Name", "index"]) # transform in tidy format each df                    for i in [df1, df2, df3]],                    ignore_index=True) 

The problem with seaborn is that it doesn't stack bars natively, so the trick is to plot the cumulative sum of each bar on top of each other:

dfall.set_index(["Name", "index", "variable"], inplace=1) dfall["vcs"] = dfall.groupby(level=["Name", "index"]).cumsum() dfall.reset_index(inplace=True)   >>> dfall.head(6)   Name index variable     value       vcs 0  df1     A        I  0.717286  0.717286 1  df1     B        I  0.236867  0.236867 2  df1     C        I  0.952557  0.952557 3  df1     D        I  0.487995  0.487995 4  df1     A        J  0.174489  0.891775 5  df1     B        J  0.332001  0.568868 

Then loop over each group of variable and plot the cumulative sum:

c = ["blue", "purple", "red", "green", "pink"] for i, g in enumerate(dfall.groupby("variable")):     ax = sns.barplot(data=g[1],                      x="index",                      y="vcs",                      hue="Name",                      color=c[i],                      zorder=-i, # so first bars stay on top                      edgecolor="k") ax.legend_.remove() # remove the redundant legends  

multiple stack bar plot seaborn

It lacks the legend that can be added easily I think. The problem is that instead of hatches (which can be added easily) to differentiate the dataframes we have a gradient of lightness, and it's a bit too light for the first one, and I don't really know how to change that without changing each rectangle one by one (as in the first solution).

Tell me if you don't understand something in the code.

Feel free to re-use this code which is under CC0.

like image 179
jrjc Avatar answered Sep 21 '22 17:09

jrjc