So here is how my data set looks like : <pre class="prettyprint"><code>In [1]: df1=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"]) In [2]: df2=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"]) In [3]: df1 Out[3]: I J A 0.675616 0.177597 B 0.675693 0.598682 C 0.631376 0.598966 D 0.229858 0.378817 In [4]: df2 Out[4]: I J A 0.939620 0.984616 B 0.314818 0.456252 C 0.630907 0.656341 D 0.020994 0.538303 </code></pre> I want to have stacked bar plot for each dataframe but since they have same index, I'd like to have 2 stacked bars per index. I've tried to plot both on the same axes : <pre class="prettyprint"><code>In [5]: ax = df1.plot(kind="bar", stacked=True) In [5]: ax2 = df2.plot(kind="bar", stacked=True, ax = ax) </code></pre> But it overlaps. Then I tried to concat the two dataset first : <pre class="prettyprint"><code>pd.concat(dict(df1 = df1, df2 = df2),axis = 1).plot(kind="bar", stacked=True) </code></pre> but here everything is stacked My best try is : <pre class="prettyprint"><code> pd.concat(dict(df1 = df1, df2 = df2),axis = 0).plot(kind="bar", stacked=True) </code></pre> Which gives : <img src="https://i.stack.imgur.com/dSSsI.png" alt="enter image description here"> This is basically what I want, except that I want the bar ordered as (df1,A) (df2,A) (df1,B) (df2,B) etc... I guess there is a trick but I can't found it ! <hr> After @bgschiller's answer I got this : <img src="https://i.stack.imgur.com/8Uk5l.png" alt="enter image description here"> Which is almost what I want. I would like the bar to be clustered by index, in order to have something visually clear. Bonus : Having the x-label not redundant, something like : <pre class="prettyprint"><code>df1 df2 df1 df2 _______ _______ ... A B </code></pre> Thanks for helping.

I eventually found a trick (edit: see below for using seaborn and longform dataframe): <h3>Solution with pandas and matplotlib</h3> Here it is with a more complete example : <pre class="prettyprint"><code>import pandas as pd import matplotlib.cm as cm import numpy as np import matplotlib.pyplot as plt def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot", H="/", **kwargs): """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot. labels is a list of the names of the dataframe, used for the legend title is a string for the title of the plot H is the hatch used for identification of the different dataframe""" n_df = len(dfall) n_col = len(dfall[0].columns) n_ind = len(dfall[0].index) axe = plt.subplot(111) for df in dfall : # for each data frame axe = df.plot(kind="bar", linewidth=0, stacked=True, ax=axe, legend=False, grid=False, **kwargs) # make bar plots h,l = axe.get_legend_handles_labels() # get the handles we want to modify for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df for j, pa in enumerate(h[i:i+n_col]): for rect in pa.patches: # for each index rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col)) rect.set_hatch(H * int(i / n_col)) #edited part rect.set_width(1 / float(n_df + 1)) axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.) axe.set_xticklabels(df.index, rotation = 0) axe.set_title(title) # Add invisible data to add another legend n=[] for i in range(n_df): n.append(axe.bar(0, 0, color="gray", hatch=H * i)) l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5]) if labels is not None: l2 = plt.legend(n, labels, loc=[1.01, 0.1]) axe.add_artist(l1) return axe # create fake dataframes df1 = pd.DataFrame(np.random.rand(4, 5), index=["A", "B", "C", "D"], columns=["I", "J", "K", "L", "M"]) df2 = pd.DataFrame(np.random.rand(4, 5), index=["A", "B", "C", "D"], columns=["I", "J", "K", "L", "M"]) df3 = pd.DataFrame(np.random.rand(4, 5), index=["A", "B", "C", "D"], columns=["I", "J", "K", "L", "M"]) # Then, just call : plot_clustered_stacked([df1, df2, df3],["df1", "df2", "df3"]) </code></pre> And it gives that : <img src="https://i.stack.imgur.com/3ZdAH.png" alt="multiple stacked bar plot"> You can change the colors of the bar by passing a <code>cmap</code> argument: <pre class="prettyprint"><code>plot_clustered_stacked([df1, df2, df3], ["df1", "df2", "df3"], cmap=plt.cm.viridis) </code></pre> <hr> <h3>Solution with seaborn:</h3> Given the same df1, df2, df3, below, I convert them in a long form: <pre class="prettyprint"><code>df1["Name"] = "df1" df2["Name"] = "df2" df3["Name"] = "df3" dfall = pd.concat([pd.melt(i.reset_index(), id_vars=["Name", "index"]) # transform in tidy format each df for i in [df1, df2, df3]], ignore_index=True) </code></pre> The problem with seaborn is that it doesn't stack bars natively, so the trick is to plot the cumulative sum of each bar on top of each other: <pre class="prettyprint"><code>dfall.set_index(["Name", "index", "variable"], inplace=1) dfall["vcs"] = dfall.groupby(level=["Name", "index"]).cumsum() dfall.reset_index(inplace=True) >>> dfall.head(6) Name index variable value vcs 0 df1 A I 0.717286 0.717286 1 df1 B I 0.236867 0.236867 2 df1 C I 0.952557 0.952557 3 df1 D I 0.487995 0.487995 4 df1 A J 0.174489 0.891775 5 df1 B J 0.332001 0.568868 </code></pre> Then loop over each group of <code>variable</code> and plot the cumulative sum: <pre class="prettyprint"><code>c = ["blue", "purple", "red", "green", "pink"] for i, g in enumerate(dfall.groupby("variable")): ax = sns.barplot(data=g[1], x="index", y="vcs", hue="Name", color=c[i], zorder=-i, # so first bars stay on top edgecolor="k") ax.legend_.remove() # remove the redundant legends </code></pre> <img src="https://i.stack.imgur.com/mVUc1.png" alt="multiple stack bar plot seaborn"> It lacks the legend that can be added easily I think. The problem is that instead of hatches (which can be added easily) to differentiate the dataframes we have a gradient of lightness, and it's a bit too light for the first one, and I don't really know how to change that without changing each rectangle one by one (as in the first solution). Tell me if you don't understand something in the code. Feel free to re-use this code which is under CC0.

How to have clusters of stacked bars with python (Pandas)

Tags:

python

pandas

matplotlib

plot

seaborn

So here is how my data set looks like :

In [1]: df1=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])  In [2]: df2=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])  In [3]: df1 Out[3]:            I         J A  0.675616  0.177597 B  0.675693  0.598682 C  0.631376  0.598966 D  0.229858  0.378817  In [4]: df2 Out[4]:            I         J A  0.939620  0.984616 B  0.314818  0.456252 C  0.630907  0.656341 D  0.020994  0.538303

I want to have stacked bar plot for each dataframe but since they have same index, I'd like to have 2 stacked bars per index.

I've tried to plot both on the same axes :

In [5]: ax = df1.plot(kind="bar", stacked=True)  In [5]: ax2 = df2.plot(kind="bar", stacked=True, ax = ax)

But it overlaps.

Then I tried to concat the two dataset first :

pd.concat(dict(df1 = df1, df2 = df2),axis = 1).plot(kind="bar", stacked=True)

but here everything is stacked

My best try is :

 pd.concat(dict(df1 = df1, df2 = df2),axis = 0).plot(kind="bar", stacked=True)

Which gives :

enter image description here

This is basically what I want, except that I want the bar ordered as

(df1,A) (df2,A) (df1,B) (df2,B) etc...

I guess there is a trick but I can't found it !

After @bgschiller's answer I got this :

enter image description here

Which is almost what I want. I would like the bar to be clustered by index, in order to have something visually clear.

Bonus : Having the x-label not redundant, something like :

df1 df2    df1 df2 _______    _______ ...    A          B

Thanks for helping.

850

asked Apr 01 '14 13:04

jrjc

1 Answers

I eventually found a trick (edit: see below for using seaborn and longform dataframe):

Solution with pandas and matplotlib

Here it is with a more complete example :

import pandas as pd import matplotlib.cm as cm import numpy as np import matplotlib.pyplot as plt  def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/", **kwargs):     """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot.  labels is a list of the names of the dataframe, used for the legend title is a string for the title of the plot H is the hatch used for identification of the different dataframe"""      n_df = len(dfall)     n_col = len(dfall[0].columns)      n_ind = len(dfall[0].index)     axe = plt.subplot(111)      for df in dfall : # for each data frame         axe = df.plot(kind="bar",                       linewidth=0,                       stacked=True,                       ax=axe,                       legend=False,                       grid=False,                       **kwargs)  # make bar plots      h,l = axe.get_legend_handles_labels() # get the handles we want to modify     for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df         for j, pa in enumerate(h[i:i+n_col]):             for rect in pa.patches: # for each index                 rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))                 rect.set_hatch(H * int(i / n_col)) #edited part                      rect.set_width(1 / float(n_df + 1))      axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)     axe.set_xticklabels(df.index, rotation = 0)     axe.set_title(title)      # Add invisible data to add another legend     n=[]             for i in range(n_df):         n.append(axe.bar(0, 0, color="gray", hatch=H * i))      l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])     if labels is not None:         l2 = plt.legend(n, labels, loc=[1.01, 0.1])      axe.add_artist(l1)     return axe  # create fake dataframes df1 = pd.DataFrame(np.random.rand(4, 5),                    index=["A", "B", "C", "D"],                    columns=["I", "J", "K", "L", "M"]) df2 = pd.DataFrame(np.random.rand(4, 5),                    index=["A", "B", "C", "D"],                    columns=["I", "J", "K", "L", "M"]) df3 = pd.DataFrame(np.random.rand(4, 5),                    index=["A", "B", "C", "D"],                     columns=["I", "J", "K", "L", "M"])  # Then, just call : plot_clustered_stacked([df1, df2, df3],["df1", "df2", "df3"])

And it gives that :

multiple stacked bar plot

You can change the colors of the bar by passing a cmap argument:

plot_clustered_stacked([df1, df2, df3],                        ["df1", "df2", "df3"],                        cmap=plt.cm.viridis)

Solution with seaborn:

Given the same df1, df2, df3, below, I convert them in a long form:

df1["Name"] = "df1" df2["Name"] = "df2" df3["Name"] = "df3" dfall = pd.concat([pd.melt(i.reset_index(),                            id_vars=["Name", "index"]) # transform in tidy format each df                    for i in [df1, df2, df3]],                    ignore_index=True)

The problem with seaborn is that it doesn't stack bars natively, so the trick is to plot the cumulative sum of each bar on top of each other:

dfall.set_index(["Name", "index", "variable"], inplace=1) dfall["vcs"] = dfall.groupby(level=["Name", "index"]).cumsum() dfall.reset_index(inplace=True)   >>> dfall.head(6)   Name index variable     value       vcs 0  df1     A        I  0.717286  0.717286 1  df1     B        I  0.236867  0.236867 2  df1     C        I  0.952557  0.952557 3  df1     D        I  0.487995  0.487995 4  df1     A        J  0.174489  0.891775 5  df1     B        J  0.332001  0.568868

Then loop over each group of variable and plot the cumulative sum:

c = ["blue", "purple", "red", "green", "pink"] for i, g in enumerate(dfall.groupby("variable")):     ax = sns.barplot(data=g[1],                      x="index",                      y="vcs",                      hue="Name",                      color=c[i],                      zorder=-i, # so first bars stay on top                      edgecolor="k") ax.legend_.remove() # remove the redundant legends

multiple stack bar plot seaborn

It lacks the legend that can be added easily I think. The problem is that instead of hatches (which can be added easily) to differentiate the dataframes we have a gradient of lightness, and it's a bit too light for the first one, and I don't really know how to change that without changing each rectangle one by one (as in the first solution).

Tell me if you don't understand something in the code.

Feel free to re-use this code which is under CC0.

179

answered Sep 21 '22 17:09

jrjc

Related questions
                            
                                Can't get argparse to read quoted string with dashes in it?
                            
                                How do I set sys.argv so I can unit test it?
                            
                                Is there an equivalent to the "for ... else" Python loop in C++?
                            
                                How to run script with elevated privilege on windows
                            
                                How do I find the closest values in a Pandas series to an input number?
                            
                                How to create in-memory file object
                            
                                How to make a multidimension numpy array with a varying row size?
                            
                                Is there a quiet version of subprocess.call?
                            
                                __getattr__ for static/class variables in python
                            
                                Get location of the .py source file
                            
                                pandas - change df.index from float64 to unicode or string
                            
                                seaborn scatterplot marker size for ALL markers
                            
                                Class factory in Python
                            
                                generating variable names on fly in python [duplicate]
                            
                                Importing packages in Python
                            
                                How can I get the current contents of an element in webdriver
                            
                                T-test in Pandas
                            
                                TypeError: Can't convert 'int' object to str implicitly
                            
                                Why does splatting create a tuple on the rhs but a list on the lhs?
                            
                                Getting console.log output from Chrome with Selenium Python API bindings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to have clusters of stacked bars with python (Pandas)

Tags:

python

pandas

matplotlib

plot

seaborn

jrjc

People also ask

1 Answers

Solution with pandas and matplotlib

Solution with seaborn:

jrjc

Recent Activity

Donate For Us