Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting a histogram from pre-counted data in Matplotlib

I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data

data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10] 

Given this data, I can use

pylab.hist(data, bins=[...]) 

to plot a histogram.

In my case, the data has been pre-counted and is represented as a dictionary:

counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1} 

Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:

data = list(chain.from_iterable(repeat(value, count)             for (value, count) in counted_data.iteritems())) 

This is inefficient when counted_data contains counts for millions of data points.

Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?

Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?

like image 321
Josh Rosen Avatar asked Oct 06 '13 18:10

Josh Rosen


People also ask

How do you plot a histogram for a dataset in Python?

To create a histogram the first step is to create bin of the ranges, then distribute the whole range of the values into a series of intervals, and count the values which fall into each of the intervals. Bins are clearly identified as consecutive, non-overlapping intervals of variables. The matplotlib. pyplot.

How do I display the count over the bar in matplotlib histogram?

To display the count over the bar in matplotlib histogram, we can iterate each patch and use text() method to place the values over the patches.

Which method is used to plot histogram in Pyplot?

pyplot. hist() is a widely used histogram plotting function that uses np. histogram() and is the basis for Pandas' plotting functions.


1 Answers

You can use the weights keyword argument to np.histgram (which plt.hist calls underneath)

val, weight = zip(*[(k, v) for k,v in counted_data.items()]) plt.hist(val, weights=weight) 

Assuming you only have integers as the keys, you can also use bar directly:

min_bin = np.min(counted_data.keys()) max_bin = np.max(counted_data.keys())  bins = np.arange(min_bin, max_bin + 1) vals = np.zeros(max_bin - min_bin + 1)  for k,v in counted_data.items():     vals[k - min_bin] = v  plt.bar(bins, vals, ...) 

where ... is what ever arguments you want to pass to bar (doc)

If you want to re-bin your data see Histogram with separate list denoting frequency

like image 180
tacaswell Avatar answered Sep 21 '22 05:09

tacaswell