Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot a probability mass function in python

How can I create a histogram that shows the probability distribution given an array of numbers x ranging from 0-1? I expect each bar to be <= 1 and that if I sum the y values of every bar they should add up to 1.

For example, if x=[.2, .2, .8] then I would expect a graph showing 2 bars, one at .2 with height .66, one at .8 with height .33.

I've tried:

matplotlib.pyplot.hist(x, bins=50, normed=True)

which gives me a histogram with bars that go above 1. I'm not saying that's wrong since that's what the normed parameter will do according to documentation, but that doesn't show the probabilities.

I've also tried:

counts, bins = numpy.histogram(x, bins=50, density=True)
bins = bins[:-1] + (bins[1] - bins[0])/2
matplotlib.pyplot.bar(bins, counts, 1.0/50)

which also gives me bars whose y values sum to greater than 1.

like image 294
kmosley Avatar asked Oct 21 '13 19:10

kmosley


2 Answers

I think my original terminology was off. I have an array of continuous values [0-1) which I want to discretize and use to plot a probability mass function. I thought this might be common enough to warrant a single method to do it.

Here's the code:

x = [random.random() for r in xrange(1000)]
num_bins = 50
counts, bins = np.histogram(x, bins=num_bins)
bins = bins[:-1] + (bins[1] - bins[0])/2
probs = counts/float(counts.sum())
print probs.sum() # 1.0
plt.bar(bins, probs, 1.0/num_bins)
plt.show()
like image 166
kmosley Avatar answered Nov 03 '22 00:11

kmosley


I think you are mistaking a sum for an integral. A proper PDF (probability distribution function) integrates to unity; if you simply take the sum you may be missing out on the size of the rectangle.

import numpy as np
import pylab as plt

N = 10**5
X = np.random.normal(size=N)

counts, bins = np.histogram(X,bins=50, density=True)
bins = bins[:-1] + (bins[1] - bins[0])/2

print np.trapz(counts, bins)

Gives .999985, which is close enough to unity.

EDIT: In response to the comment below:

If x=[.2, .2, .8] and I'm looking for a graph with two bars, one at .2 with height .66 because 66% of the values are at .2 and one bar at .8 with height .33, what would that graph be called and how do I generate it?

The following code:

from collections import Counter
x = [.2,.2,.8]
C = Counter(x)
total = float(sum(C.values()))
for key in C: C[key] /= total

Gives a "dictionary" C=Counter({0.2: 0.666666, 0.8: 0.333333}). From here one could construct a bar graph, but this would only work if the PDF is discrete and takes only a finite fixed set of values that are well separated from each other.

like image 28
Hooked Avatar answered Nov 03 '22 01:11

Hooked