Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting multiple overlapped histogram with pandas

I have two different dataframes with 19 variables each and I'm plotting a multiple plot with the histograms of each variable like this:

fig, ax = plt.subplots(figsize=(19,10), dpi=50)
dataframe1.hist(ax=ax, layout=(3,7), alpha=0.5)

fig, ax = plt.subplots(figsize=(19,10), dpi=50)
dataframe2.hist(ax=ax, layout=(3,7), alpha=0.5)

This produce two images with 19 histograms inside. What I want to try is to plot only one image with the shared histograms in the same subplot.

I tried this:

fig, ax = plt.subplots(figsize=(19,10), dpi=50)
dataframe1.hist(ax=ax, layout=(3,7), alpha=0.5, label='x')
dataframe2.hist(ax=ax, layout=(3,7), alpha=0.5, label='y', color='red')

But its only painting the last one. This is a similar example: Plot two histograms at the same time with matplotlib but how could I apply it two my 19 subplots?

Any ideas will be welcomed, thanks in advance!

enter image description here

P.S: I'm currently using Jupyter Notebooks with the %matplotlib notebook option

like image 557
Sergiodiaz53 Avatar asked Dec 14 '22 12:12

Sergiodiaz53


2 Answers

Your problem is that you create only one Axes object in your plt.subplots call, when you actually need 21 (3x7). As the amount of subplots provided does not match the amount of subplots requested, pandas creates new subplots. Because this happens twice, you only see the second set of histograms.

You can leave out the call to subplots altogether and let pandas do all the work. The call to hist returns all the subplots needed and this can then be used in the second call to hist.

EDIT:

I realised that, if the amount of desired plots is not actually equal to the amount of grid cells (in this case 3x9=21), you must pass exactly the amount of subplots that you actually want to plot on (in this case 19). However, the call to df.hist returns a subplot for each grid cell (i.e. 21) and apparently hides the unused ones. Hence you have to pass only a subset of all returned subplots to the second call to hist. This is easiest done by converting the 2d array of subplots into a 1d array and then slicing this array, for instance with `axes.ravel()[:19]. I edited the code accordingly:

import numpy as np
from matplotlib import pyplot as plt
import pandas as pd

length=19

loc = np.random.randint(0,50,size=length)
scale = np.random.rand(length)*10
dist = np.random.normal(loc=loc, scale=scale, size=(100,length))
df1 = pd.DataFrame(data=list(dist))


axes = df1.hist(layout=(3,7), alpha=0.5, label='x')

loc = np.random.randint(0,50,size=length)
scale = np.random.rand(length)*10
dist = np.random.normal(loc=loc, scale=scale, size=(100,length))
df2 = pd.DataFrame(data=list(dist))

df2.hist(ax=axes.ravel()[:length], layout=(3,7), alpha=0.5, label='x',color='r')

plt.show()

This produces output like this:

result of the above code

like image 50
Thomas Kühn Avatar answered Jan 07 '23 05:01

Thomas Kühn


When you call subplots, you can specify the number of rows and columns that you want. In your case, you want 3 rows and 7 columns. However, .plot will be annoyed at there being 21 axes but only 19 to plot from your dataframe. So instead, we'll flatten the axes into a list and convert to a list, which will allow us to remove the last two from both the figure and the set of axes simultaneously through .pop()

fig, axes = plt.subplots(figsize=(19,10), dpi=50, nrows=3, ncols=7)
flat_axes = list(axes.reshape(-1))
fig.delaxes(flat_axes.pop(-1))
fig.delaxes(flat_axes.pop(-1))

dataframe1.hist(ax=flat_axes, alpha=0.5, label='x')
dataframe2.hist(ax=flat_axes, alpha=0.5, label='x',color='r')
like image 42
asongtoruin Avatar answered Jan 07 '23 05:01

asongtoruin