I'm looking for a way to plot a distribution histogram, with the y-axis
representing the total number of items for each bin (and not just the count).
Example on the charts below:
It's not as trivial as it looks because one can't simply multiply each bin's count by the bin's value (maybe in the 20-30 bin, there are 54 agencies who sold 21 are 1 who sold 29).
Questions:
matplotlib
or seaborn
?A weighted histogram shows the weighted distribution of the data. If the histogram displays proportions (rather than raw counts), then the heights of the bars are the sum of the standardized weights of the observations within each bin.
To construct a histogram, the first step is to “bin” the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable.
Bins is the number of groups in that the data is going to aggregate for the count. For example, if you have the numbers 1,1,2,3,5,6,6 and you want 3 bins you get three columns(binds) in your histogram: Column 1: [1,1,2] <=2 (value 3) Column 2: 2<[3]<=3 (value 1) Column 3: 3<[5,6,6]<=6 (value 3)
The towers or bars of a histogram are called bins. The height of each bin shows how many values from that data fall into that range. Width of each bin is = (max value of data – min value of data) / total number of bins The default value of the number of bins to be created in a histogram is 10.
A weighted histogram shows the weighted distribution of the data. If the histogram displays proportions (rather than raw counts), then the heights of the bars are the sum of the standardized weights of the observations within each bin.
Now, we can use the breaks argument of the hist function to increase or decrease the width of our bars. In this example, we use a number of 100 bins:
A histogram is a representation of the distribution of data. This function groups the values of all given Series in the DataFrame into bins and draws all bins in one matplotlib.axes.Axes . This is useful when the DataFrame’s Series are in a similar scale. Column in the DataFrame to group by. Number of histogram bins to be used.
You want to use the weights
kwarg (see numpy docs) which is passed through ax.hist
(see).
Something like
fig, ax = plt.subplots()
ax.hist(num_sold, bins, weights=num_sold)
Edit: @tacaswell is better use it. But the labels for mine will line up correctly without hassle and the bars will be separated.
Hopefully your data is in pandas. I will create some fake data and then give you a solution.
import pandas as pd
# create a dataframe of number of homes sold
df = pd.DataFrame(data={'sold':np.random.randint(0,100, 1000)})
# groupby the left side of interval [0, 10), [10, 20) etc.. and plot
df.groupby(df.sold // 10 * 10).sum().plot.bar()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With