Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between `np.histogram` and `plt.hist`? Why don't these commands plot the same graphics?

UPDATE: Sorry again, the code was updated due to correct comments. And there is still some problem with graphics - one hist is shifted to another.

UPDATE: I'm sorry, these hists have different number of bins. And even at this point setting '5' as number of bins in plt.hist doesn't help

The code below computes two histograms on the same datasource. And plotting these histograms shows that they don't coincide. A mark for np.hist : it returns a tuple of two arrays - values of bins including edge bins and a number of counts. So I thought that it could be reasonable to center values of bin edge locations.

import numpy as np
import matplotlib.pyplot as plt
s = [1,1,1,1,2,2,2,3,3,4,5,5,5,6,7,7,7,7,7,7,7]

xmin = 1
xmax = 7
step = 1.
print 'nbins=',(xmax-xmin)/step
print np.linspace(xmin, xmax, (xmax-xmin)/step)
h1 = np.histogram(s, bins=np.linspace(xmin, xmax, (xmax-xmin)/step))
print h1
def calc_centers_of_bins(x):
    return  list(x[i]+(x[i]-x[i+1])/2.0 for i in xrange(len(x)-1))

x = h1[1].tolist()
print x
y = h1[0].tolist()


plt.bar(calc_centers_of_bins(x),y, width=(x[-1]-x[0])/(len(y)), color='red', alpha=0.5)
plt.hist(s, bins=5,alpha=0.5)
plt.grid(True)
plt.show()

image

like image 328
aestet Avatar asked Dec 11 '13 22:12

aestet


1 Answers

You're using different bins in the two cases. In your case, np.linspace(xmin, xmax, (xmax-xmin)/step) has 5 bins, but you've told plt.hist to use 6 bins.

You can see this by looking at the output of each:

h1 = np.histogram(s, bins=np.linspace(xmin, xmax, (xmax-xmin)/step))
h_plt = plt.hist(s, bins=6,alpha=0.5)

Then:

>>> h1[1]
array([ 1. ,  2.2,  3.4,  4.6,  5.8,  7. ])
>>> h_plt[1]
array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.])

I would use:

y, x = np.histogram(s, bins=np.linspace(xmin, xmax, (xmax-xmin)/step))
nbins = y.size
# ...
plt.hist(s, bins=nbins, alpha=0.5)

Then your histograms match, but your plot still won't because you've plotted the output of your np.histogram at the centers of the bins, but plt.bar expects an array of the left edges:

plt.bar(left, height, width=0.8, bottom=None, hold=None, **kwargs)

Parameters
----------
left : sequence of scalars
the x coordinates of the left sides of the bars

height : sequence of scalars
the heights of the bars

What you want is:

import numpy as np
import matplotlib.pyplot as plt
s = [1,1,1,1,2,2,2,3,3,4,5,5,5,6,7,7,7,7,7,7,7]

xmin = 1
xmax = 7
step = 1
y, x = np.histogram(s, bins=np.linspace(xmin, xmax, (xmax-xmin)/step))

nbins = y.size

plt.bar(x[:-1], y, width=x[1]-x[0], color='red', alpha=0.5)
plt.hist(s, bins=nbins, alpha=0.5)
plt.grid(True)
plt.show()

two hists

like image 98
askewchan Avatar answered Oct 06 '22 01:10

askewchan