Python: matplotlib - probability mass function as histogram

Tags:

I want to draw a histogram and a line plot at the same graph. However, to do that I need to have my histogram as a probability mass function, so I want to have on the y-axis a probability values. However, I don't know how to do that, because using the normed option didn't helped. Below is my source code and a sneak peek of used data. I would be very grateful for all suggestions.

data = [12565, 1342, 5913, 303, 3464, 4504, 5000, 840, 1247, 831, 2771, 4005, 1000, 1580, 7163, 866, 1732, 3361, 2599, 4006, 3583, 1222, 2676, 1401, 2598, 697, 4078, 5016, 1250, 7083, 3378, 600, 1221, 2511, 9244, 1732, 2295, 469, 4583, 1733, 1364, 2430, 540, 2599, 12254, 2500, 6056, 833, 1600, 5317, 8333, 2598, 950, 6086, 4000, 2840, 4851, 6150, 8917, 1108, 2234, 1383, 2174, 2376, 1729, 714, 3800, 1020, 3457, 1246, 7200, 4001, 1211, 1076, 1320, 2078, 4504, 600, 1905, 2765, 2635, 1426, 1430, 1387, 540, 800, 6500, 931, 3792, 2598, 5033, 1040, 1300, 1648, 2200, 2025, 2201, 2074, 8737, 324]
plt.style.use('ggplot')
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)
plt.xlabel("Incomes")
plt.hist(data, bins=50, color="blue", alpha=0.5, normed=True)
plt.show()

569

asked Jun 17 '15 10:06

Ziva

2 Answers

As far as I know, matplotlib does not have this function built-in. However, it is easy enough to replicate

    import numpy as np
    heights,bins = np.histogram(data,bins=50)
    heights = heights/sum(heights)
    plt.bar(bins[:-1],heights,width=(max(bins) - min(bins))/len(bins), color="blue", alpha=0.5)

Edit: Here is another approach from a similar question:

     weights = np.ones_like(data)/len(data)
     plt.hist(data, bins=50, weights=weights, color="blue", alpha=0.5, normed=False)

188

answered Sep 20 '22 07:09

mmdanziger

This is old, but since I found it and was about to use it before I noticed some mistakes, I figured I'd add a comment for a couple of fixes I noticed. In the example @mmdanziger uses the bin edges in plt.bar, however, you need to actually use the centers of the bin. Also they assume that the bins are of equal width, which is fine "most" of the time. But you can also pass it an array of widths, which keep you from inadvertently forgetting and making a mistake. So here's a more complete example:

import numpy as np
heights, bins = np.histogram(data, bins=50)
heights = heights/sum(heights)
bin_centers = 0.5*(bins[1:] + bins[:-1])
bin_widths = np.diff(bins)
plt.bar(bin_centers, heights, width=bin_widths, color="blue", alpha=0.5)

@mmdanziger other option of passing weights = np.ones_like(data)/len(data) to plt.hist() also does the same thing, and for many is an easier approach.

answered Sep 21 '22 07:09

Tyler Acorn

Related questions
                            
                                Error sending email: raise SMTPAuthenticationError(code, resp)
                            
                                Why is my Python script not writing to file when it is backgrounded it in Linux?
                            
                                Kivy CheckBox Looks Like Solid Black Box (Not a Checkbox)
                            
                                How to install a package not supported by condas
                            
                                django migration table does not exist
                            
                                Python error: execute cannot be used while an asynchronous query is underway
                            
                                How to set default matplotlib style?
                            
                                How to use scipy.optimize.minimize
                            
                                python Non-block read file
                            
                                python all combinations of subsets of a string
                            
                                Django with Twitter Bootstrap3 and Themes
                            
                                Unresolved reference to self for class function
                            
                                Storing list of objects in Django model
                            
                                Select pandas frame rows based on two columns' values
                            
                                Infinite loop when streaming a .gz file from S3 using boto
                            
                                How to expand environment variables in python as bash does?
                            
                                How to use Python 3.4's enums without significant slowdown?
                            
                                SQLAlchemy: filter by relationship
                            
                                Why does round(5/2) return 2?
                            
                                Why should I use __all__ in __init__ of python package?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: matplotlib - probability mass function as histogram

Tags:

python

matplotlib

plot

python-2.7

histogram

Ziva

People also ask

2 Answers

mmdanziger

Tyler Acorn

Recent Activity

Donate For Us