I like to plot my histograms like this:
data = [-0.5, 0.5, 0.5, 0.5,
1.5, 2.1, 2.2, 2.3, 2.4, 2.5, 3.1, 3.2]
plt.hist(data, bins=5, range=[-1, 4], histtype='step')
Now, when I have somehow large input data (larger than my memory), I need to fill the histogram chunk by chunk. E.g. like this:
H, bins = np.histogram([], bins=5, range=[-1, 4])
for data in a_lot_of_input_files:
H += np.histogram(data, bins=5, range=[-1, 4])[0]
But the question is always, "How do I plot this H
again, so it looks just like the previous matplotlib version.
The solution I came up with, looks like this:
plt.plot(bins, np.insert(H, 0, H[0]), '-', drawstyle='steps')
However, neither looks the result identical, nor does it feel very nice to create a copy of H
for plotting it.
Is there some elegant solution I am missing? (I did not yet try to use plt.bar
, because the bar-graphs don't work nicely, when one wants to compare histograms)
The function has two return values hist which gives the array of values of the histogram, and edge_bin which is an array of float datatype containing the bin edges having length one more than the hist.
plt() Matplotlib can convert this numeric representation of histogram into a graph. The plt() function of pyplot submodule takes the array containing the data and bin array as parameters and converts into a histogram.
plt. hist() method is used multiple times to create a figure of three overlapping histograms. we adjust opacity, color, and number of bins as needed. Three different columns from the data frame are taken as data for the histograms.
To normalize a histogram in Python, we can use hist() method. In normalized bar, the area underneath the plot should be 1.
Not sure what you mean by "bar-graphs don't work nicely, when one wants to compare histograms",
One way to do this is with plt.bar
:
import matplotlib.pyplot as plt
import numpy as np
data = [-0.5, 0.5, 0.5, 0.5,
1.5, 2.1, 2.2, 2.3, 2.4, 2.5, 3.1, 3.2]
plt.hist(data, bins=5, range=[-1, 4], histtype='step',edgecolor='r',linewidth=3)
H, bins = np.histogram(data[:6], bins=5, range=[-1, 4])
H+=np.histogram(data[6:], bins=5,range=[-1, 4])[0]
plt.bar(bins[:-1],H,width=1)
plt.show()
An alternative is plt.step
:
import matplotlib.pyplot as plt
import numpy as np
data = [-0.5, 0.5, 0.5, 0.5,
1.5, 2.1, 2.2, 2.3, 2.4, 2.5, 3.1, 3.2]
plt.hist(data, bins=5, range=[-1, 4], histtype='step',edgecolor='r')
H, bins = np.histogram(data[:6], bins=5, range=[-1, 4])
H+=np.histogram(data[6:], bins=5,range=[-1, 4])[0]
bincentres = [(bins[i]+bins[i+1])/2. for i in range(len(bins)-1)]
plt.step(bincentres,H,where='mid',color='b',linestyle='--')
plt.ylim(0,6)
plt.show()
The edges don't quite extend all the way, so you might need to add a 0-bin to either end if that's a big problem for you
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With