I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data
data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10]
Given this data, I can use
pylab.hist(data, bins=[...])
to plot a histogram.
In my case, the data has been pre-counted and is represented as a dictionary:
counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1}
Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:
data = list(chain.from_iterable(repeat(value, count) for (value, count) in counted_data.iteritems()))
This is inefficient when counted_data
contains counts for millions of data points.
Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?
Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?
To create a histogram the first step is to create bin of the ranges, then distribute the whole range of the values into a series of intervals, and count the values which fall into each of the intervals. Bins are clearly identified as consecutive, non-overlapping intervals of variables. The matplotlib. pyplot.
To display the count over the bar in matplotlib histogram, we can iterate each patch and use text() method to place the values over the patches.
pyplot. hist() is a widely used histogram plotting function that uses np. histogram() and is the basis for Pandas' plotting functions.
You can use the weights
keyword argument to np.histgram
(which plt.hist
calls underneath)
val, weight = zip(*[(k, v) for k,v in counted_data.items()]) plt.hist(val, weights=weight)
Assuming you only have integers as the keys, you can also use bar
directly:
min_bin = np.min(counted_data.keys()) max_bin = np.max(counted_data.keys()) bins = np.arange(min_bin, max_bin + 1) vals = np.zeros(max_bin - min_bin + 1) for k,v in counted_data.items(): vals[k - min_bin] = v plt.bar(bins, vals, ...)
where ... is what ever arguments you want to pass to bar
(doc)
If you want to re-bin your data see Histogram with separate list denoting frequency
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With