Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create stacked histogram from unequal length arrays

I'd like to create a stacked histogram. If I have a single 2-D array, made of three equal length data sets, this is simple. Code and image below:

import numpy as np
from matplotlib import pyplot as plt

# create 3 data sets with 1,000 samples
mu, sigma = 200, 25
x = mu + sigma*np.random.randn(1000,3)

#Stack the data
plt.figure()
n, bins, patches = plt.hist(x, 30, stacked=True, density = True)
plt.show()

enter image description here

However, if I try similar code with three data sets of a different length the results are that one histogram covers up another. Is there any way I can do the stacked histogram with mixed length data sets?

##Continued from above
###Now as three separate arrays
x1 = mu + sigma*np.random.randn(990,1)
x2 = mu + sigma*np.random.randn(980,1)
x3 = mu + sigma*np.random.randn(1000,1)

#Stack the data
plt.figure()
plt.hist(x1, bins, stacked=True, density = True)
plt.hist(x2, bins, stacked=True, density = True)
plt.hist(x3, bins, stacked=True, density = True)
plt.show()

enter image description here

like image 723
ncRubert Avatar asked Aug 26 '13 17:08

ncRubert


People also ask

Which argument in hist () is used to create a stacked bar type histogram?

histtype : This parameter is an optional parameter and it is used to draw type of histogram. {'bar', 'barstacked', 'step', 'stepfilled'}

How do you plot a histogram with different variables in Python?

plt. hist() method is used multiple times to create a figure of three overlapping histograms. we adjust opacity, color, and number of bins as needed. Three different columns from the data frame are taken as data for the histograms.

How do you make a multiple histogram in Python?

To make multiple overlapping histograms, we need to use Matplotlib pyplot's hist function multiple times. For example, to make a plot with two histograms, we need to use pyplot's hist() function two times. Here we adjust the transparency with alpha parameter and specify a label for each variable.


2 Answers

Well, this is simple. I just need to put the three arrays in a list.

##Continued from above
###Now as three separate arrays
x1 = mu + sigma*np.random.randn(990,1)
x2 = mu + sigma*np.random.randn(980,1)
x3 = mu + sigma*np.random.randn(1000,1)

#Stack the data
plt.figure()
plt.hist([x1,x2,x3], bins, stacked=True, density=True)
plt.show()
like image 99
ncRubert Avatar answered Oct 05 '22 03:10

ncRubert


  • If pandas is an option, the arrays can be loaded into a dataframe and plotted.
  • The benefit of using pandas, is the data is now in a useful format for additional analysis and other plots.
  • The following code will create a list of DataFrames with pandas.DataFrame, for each array, and then concat the arrays together in a list-comprehension.
    • This is a correct way to create a dataframe of arrays that are not equal in length.
      • SO: Creating dataframe from a dictionary where entries have different lengths has more ways to create dataframes from arrays of unequal length.
    • For equal length arrays, use df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
  • Use pandas.DataFrame.plot, which uses matplotlib as the default plot engine.
    • normed has been replaced with density in matplotlib
    • See the density parameter in matplotlib.pyplot.hist for an explanation of the y-axis values.
  • For additional information:
    • Plot a histogram such that bar heights sum to 1 (probability)
import pandas as pd
import numpy as np

# create the uneven arrays
mu, sigma = 200, 25
np.random.seed(365)
x1 = mu + sigma*np.random.randn(990, 1)
x2 = mu + sigma*np.random.randn(980, 1)
x3 = mu + sigma*np.random.randn(1000, 1)

# create the dataframe; enumerate is used to make column names
df = pd.concat([pd.DataFrame(a, columns=[f'x{i}']) for i, a in enumerate([x1, x2, x3], 1)], axis=1)

# plot the data
df.plot.hist(stacked=True, bins=30, density=True, figsize=(10, 6), grid=True)

enter image description here

like image 38
Trenton McKinney Avatar answered Oct 05 '22 02:10

Trenton McKinney