Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: matplotlib - probability mass function as histogram

I want to draw a histogram and a line plot at the same graph. However, to do that I need to have my histogram as a probability mass function, so I want to have on the y-axis a probability values. However, I don't know how to do that, because using the normed option didn't helped. Below is my source code and a sneak peek of used data. I would be very grateful for all suggestions.

data = [12565, 1342, 5913, 303, 3464, 4504, 5000, 840, 1247, 831, 2771, 4005, 1000, 1580, 7163, 866, 1732, 3361, 2599, 4006, 3583, 1222, 2676, 1401, 2598, 697, 4078, 5016, 1250, 7083, 3378, 600, 1221, 2511, 9244, 1732, 2295, 469, 4583, 1733, 1364, 2430, 540, 2599, 12254, 2500, 6056, 833, 1600, 5317, 8333, 2598, 950, 6086, 4000, 2840, 4851, 6150, 8917, 1108, 2234, 1383, 2174, 2376, 1729, 714, 3800, 1020, 3457, 1246, 7200, 4001, 1211, 1076, 1320, 2078, 4504, 600, 1905, 2765, 2635, 1426, 1430, 1387, 540, 800, 6500, 931, 3792, 2598, 5033, 1040, 1300, 1648, 2200, 2025, 2201, 2074, 8737, 324]
plt.style.use('ggplot')
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)
plt.xlabel("Incomes")
plt.hist(data, bins=50, color="blue", alpha=0.5, normed=True)
plt.show() 
like image 569
Ziva Avatar asked Jun 17 '15 10:06

Ziva


People also ask

Is histogram a probability mass function?

The histogram is what is formed from an acquired signal. The corresponding curve for the underlying process is called the probability mass function (pmf). A histogram is always calculated using a finite number of samples, while the pmf is what would be obtained with an infinite number of samples.

How will you make a histogram using Matplotlib in Python?

In Matplotlib, we use the hist() function to create histograms. The hist() function will use an array of numbers to create a histogram, the array is sent into the function as an argument.


2 Answers

As far as I know, matplotlib does not have this function built-in. However, it is easy enough to replicate

    import numpy as np
    heights,bins = np.histogram(data,bins=50)
    heights = heights/sum(heights)
    plt.bar(bins[:-1],heights,width=(max(bins) - min(bins))/len(bins), color="blue", alpha=0.5)

Edit: Here is another approach from a similar question:

     weights = np.ones_like(data)/len(data)
     plt.hist(data, bins=50, weights=weights, color="blue", alpha=0.5, normed=False) 
like image 188
mmdanziger Avatar answered Sep 20 '22 07:09

mmdanziger


This is old, but since I found it and was about to use it before I noticed some mistakes, I figured I'd add a comment for a couple of fixes I noticed. In the example @mmdanziger uses the bin edges in plt.bar, however, you need to actually use the centers of the bin. Also they assume that the bins are of equal width, which is fine "most" of the time. But you can also pass it an array of widths, which keep you from inadvertently forgetting and making a mistake. So here's a more complete example:

import numpy as np
heights, bins = np.histogram(data, bins=50)
heights = heights/sum(heights)
bin_centers = 0.5*(bins[1:] + bins[:-1])
bin_widths = np.diff(bins)
plt.bar(bin_centers, heights, width=bin_widths, color="blue", alpha=0.5)

@mmdanziger other option of passing weights = np.ones_like(data)/len(data) to plt.hist() also does the same thing, and for many is an easier approach.

like image 27
Tyler Acorn Avatar answered Sep 21 '22 07:09

Tyler Acorn