Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted bins in a distribution hist plot

I'm looking for a way to plot a distribution histogram, with the y-axis representing the total number of items for each bin (and not just the count).

Example on the charts below:

  • On the left, there are 55 agencies who sold between 20-30 houses
  • On the right, the agencies having sold between 20-30 houses represent 1100 houses sold

enter image description here

It's not as trivial as it looks because one can't simply multiply each bin's count by the bin's value (maybe in the 20-30 bin, there are 54 agencies who sold 21 are 1 who sold 29).

Questions:

  • What is the name of such a chart (the one on the right)?
  • Is there a way to plot it natively in matplotlib or seaborn?
like image 417
Jivan Avatar asked Dec 20 '16 22:12

Jivan


People also ask

What are weights in a histogram?

A weighted histogram shows the weighted distribution of the data. If the histogram displays proportions (rather than raw counts), then the heights of the bars are the sum of the standardized weights of the observations within each bin.

How do you plot a histogram with a bin?

To construct a histogram, the first step is to “bin” the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable.

What are bins in plots?

Bins is the number of groups in that the data is going to aggregate for the count. For example, if you have the numbers 1,1,2,3,5,6,6 and you want 3 bins you get three columns(binds) in your histogram: Column 1: [1,1,2] <=2 (value 3) Column 2: 2<[3]<=3 (value 1) Column 3: 3<[5,6,6]<=6 (value 3)

What is the height of a bin in a histogram?

The towers or bars of a histogram are called bins. The height of each bin shows how many values from that data fall into that range. Width of each bin is = (max value of data – min value of data) / total number of bins The default value of the number of bins to be created in a histogram is 10.

What is a weighted histogram?

A weighted histogram shows the weighted distribution of the data. If the histogram displays proportions (rather than raw counts), then the heights of the bars are the sum of the standardized weights of the observations within each bin.

How to increase or decrease the width of bars in hist?

Now, we can use the breaks argument of the hist function to increase or decrease the width of our bars. In this example, we use a number of 100 bins:

What is the use of histogram bin in Matplotlib?

A histogram is a representation of the distribution of data. This function groups the values of all given Series in the DataFrame into bins and draws all bins in one matplotlib.axes.Axes . This is useful when the DataFrame’s Series are in a similar scale. Column in the DataFrame to group by. Number of histogram bins to be used.


2 Answers

You want to use the weights kwarg (see numpy docs) which is passed through ax.hist (see).

Something like

fig, ax = plt.subplots()
ax.hist(num_sold, bins, weights=num_sold)
like image 137
tacaswell Avatar answered Sep 29 '22 07:09

tacaswell


Edit: @tacaswell is better use it. But the labels for mine will line up correctly without hassle and the bars will be separated.

Hopefully your data is in pandas. I will create some fake data and then give you a solution.

import pandas as pd

# create a dataframe of number of homes sold
df = pd.DataFrame(data={'sold':np.random.randint(0,100, 1000)})

# groupby the left side of interval [0, 10), [10, 20) etc..  and plot
df.groupby(df.sold // 10 * 10).sum().plot.bar()
like image 21
Ted Petrou Avatar answered Sep 29 '22 06:09

Ted Petrou