Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting two histograms from a pandas DataFrame in one subplot using matplotlib

I have a pandas dataframe like the following:

df = pd.DataFrame({ 'a_wood' : np.random.randn(100),
                 'a_grassland' : np.random.randn(100),
                 'a_settlement' : np.random.randn(100),
                 'b_wood' : np.random.randn(100),
                 'b_grassland' : np.random.randn(100),
                  'b_settlement' : np.random.randn(100)})

and I want to create histograms of this data with every dataframe header in one subplot.

fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')

m=0
for i in range(2):
    for j in range(3):

        df.hist(column = df.columns[m], bins = 12, ax=ax[i,j], figsize=(20, 18))
        m+=1

For that the previous code works perfectly but now I want to combine eyery a and b header (e.g. "a_woods" and "b-woods") to one subplot so there would be just three histograms. I tried assigning two columns to df.columns[[m,m+3]] but this doesn't work. I also have an index column with strings like "day_1", which I want to be on the x-axis. Can someone help me?

This is how far i got. Histogram

like image 513
Max2603 Avatar asked Aug 08 '18 14:08

Max2603


1 Answers

I don't know if I understood your question correctly, but something like this can combine the plots. You might want to play around a little with the alpha and change the headers.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame({'a_wood'       : np.random.randn(100),
                   'a_grassland'  : np.random.randn(100),
                   'a_settlement' : np.random.randn(100),
                   'b_wood'       : np.random.randn(100),
                   'b_grassland'  : np.random.randn(100),
                   'b_settlement' : np.random.randn(100)})

fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(20, 18))
n = 3
n_bins = 12

for i in range(n):
    min_value = df.iloc[:,[i,i+n]].min().min() #Get minimum value of column pairs, e.g. column 0 (a_wood) and column 3 (b_wood)
    max_value = df.iloc[:,[i,i+n]].max().max() #Get maximum value of column pairs
    bins = np.linspace(min_value, max_value, n_bins) #Create bins of equal size between min_value and max_value
    
    df.hist(column=df.columns[i], bins=bins, ax=ax[i], alpha=0.5, color='red')
    df.hist(column=df.columns[i+n], bins=bins, ax=ax[i], alpha=0.5, color='blue')
    ax[i].set_title(df.columns[i][2:])

Histogram with columns overlapping

To plot them both next to eachother, try this:

#We do not have to specify the bins in this example
fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(20, 18))

n = 3
colors = ['red', 'blue']

axes = ax.flatten()
for i,j in zip(range(n), axes):
    j.hist([df.iloc[:,i], df.iloc[:,i+n]], bins=12, color=colors)
    j.set_title(df.columns[i][2:])

Histogram with columns next to eachother

like image 193
Alex Avatar answered Oct 05 '22 01:10

Alex