How to plot a histogram with pandas DataFrame.hist() using group by? I have a data frame with 5 columns: "A", "B", "C", "D" and "Group" There are two Groups classes: "yes" and "no" Using: <pre class="prettyprint"><code>df.hist() </code></pre> I get the hist for each of the 4 columns. <img src="https://i.stack.imgur.com/71zBa.png" alt="enter image description here"> Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no"). I tried this withouth success: <pre class="prettyprint"><code>df.hist(by = "group") </code></pre> <img src="https://i.stack.imgur.com/aYZQq.png" alt="pandas hist went wrong">

This is not the most flexible workaround but will work for your question specifically. <pre class="prettyprint"><code>def sephist(col): yes = df[df['group'] == 'yes'][col] no = df[df['group'] == 'no'][col] return yes, no for num, alpha in enumerate('abcd'): plt.subplot(2, 2, num) plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b') plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r') plt.legend(loc='upper right') plt.title(alpha) plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0) </code></pre> <img src="https://i.stack.imgur.com/kQF4H.png" alt="enter image description here"> You could make this more generic by: <ul> <li>adding a <code>df</code> and <code>by</code> parameter to <code>sephist</code>: <code>def sephist(df, by, col)</code> </li> <li>making the subplots loop more flexible: <code>for num, alpha in enumerate(df.columns)</code> </li> </ul> Because the first argument to <code>matplotlib.pyplot.hist</code> can take <blockquote> either a single array or a sequency of arrays which are not required to be of the same length </blockquote> ...an alternattive would be: <pre class="prettyprint"><code>for num, alpha in enumerate('abcd'): plt.subplot(2, 2, num) plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b']) plt.legend(loc='upper right') plt.title(alpha) plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0) </code></pre> <img src="https://i.stack.imgur.com/Iav1k.png" alt="enter image description here">

Pandas histogram df.hist() group by

Tags:

pandas

matplotlib

histogram

How to plot a histogram with pandas DataFrame.hist() using group by? I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"

There are two Groups classes: "yes" and "no"

Using:

df.hist()

I get the hist for each of the 4 columns.

enter image description here

Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").

I tried this withouth success:

df.hist(by = "group")

pandas hist went wrong

919

asked Aug 25 '17 14:08

Hangon

2 Answers

Using Seaborn

If you are open to use Seaborn, a plot with multiple subplots and multiple variables within each subplot can easily be made using seaborn.FacetGrid.

import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)

df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')

bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")

g.axes[-1].legend()
plt.show()

enter image description here

124

answered Oct 19 '22 11:10

ImportanceOfBeingErnest

This is not the most flexible workaround but will work for your question specifically.

def sephist(col):
    yes = df[df['group'] == 'yes'][col]
    no = df[df['group'] == 'no'][col]
    return yes, no

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
    plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

enter image description here

You could make this more generic by:

adding a df and by parameter to sephist: def sephist(df, by, col)
making the subplots loop more flexible: for num, alpha in enumerate(df.columns)

Because the first argument to matplotlib.pyplot.hist can take

either a single array or a sequency of arrays which are not required to be of the same length

...an alternattive would be:

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

enter image description here

answered Oct 19 '22 11:10

Brad Solomon

Related questions
                            
                                pandas: dataframe to_csv, how to set column names
                            
                                get_dummies python memory error
                            
                                Python TypeError: cannot convert the series to <class 'int'> when trying to do math on dataframe
                            
                                Showing Pandas data frame as a table
                            
                                Python df.to_excel() storing numbers as text in excel. How to store as Value?
                            
                                Finding matching interval(s) in pandas Intervalindex
                            
                                Create hash value for each row of data with selected columns in dataframe in python pandas
                            
                                how to split a dataset into training and validation set keeping ratio between classes?
                            
                                Binding list to params in Pandas read_sql_query with other params
                            
                                Cosine similarity between each row in a Dataframe in Python
                            
                                Pandas Dataframe: plot colors by column name
                            
                                matplotlib plot window won't appear
                            
                                Remove dtype datetime NaT
                            
                                How to create a Decile and Quintile columns to rank another variable based on size using Python, Pandas?
                            
                                How to create an array of dataframes in Python
                            
                                Python/Pandas - Convert type from pandas period to string
                            
                                pandas, multiply all the numeric values in the data frame by a constant
                            
                                Pandas and unicode
                            
                                pandas dataframe drop columns by number of nan
                            
                                Pandas: convert date in month to the 1st day of next month

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas histogram df.hist() group by

Tags:

pandas

matplotlib

histogram

Hangon

People also ask

2 Answers

Using Seaborn

ImportanceOfBeingErnest

Brad Solomon

Recent Activity

Donate For Us