Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Histogram calculation in julia-lang

refer to julia-lang documentations :

hist(v[, n]) → e, counts

Compute the histogram of v, optionally using approximately n bins. The return values are a range e, which correspond to the edges of the bins, and counts containing the number of elements of v in each bin. Note: Julia does not ignore NaN values in the computation.

I choose a sample range of data

testdata=0:1:10;

then use hist function to calculate histogram for 1 to 5 bins

hist(testdata,1) # => (-10.0:10.0:10.0,[1,10])
hist(testdata,2) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,3) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,4) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,5) # => (-2.0:2.0:10.0,[1,2,2,2,2,2])

as you see when I want 1 bin it calculates 2 bins, and when I want 2 bins it calculates 3.

why does this happen?

like image 658
Reza Afzalan Avatar asked Sep 07 '15 10:09

Reza Afzalan


2 Answers

As the person who wrote the underlying function: the aim is to get bin widths that are "nice" in terms of a base-10 counting system (i.e. 10k, 2×10k, 5×10k). If you want more control you can also specify the exact bin edges.

like image 156
Simon Byrne Avatar answered Nov 09 '22 07:11

Simon Byrne


The key word in the doc is approximate. You can check what hist is actually doing for yourself in Julia's base module here.

When you do hist(test,3), you're actually calling

hist(v::AbstractVector, n::Integer) = hist(v,histrange(v,n))

That is, in a first step the n argument is converted into a FloatRange by the histrange function, the code of which can be found here. As you can see, the calculation of these steps is not entirely straightforward, so you should play around with this function a bit to figure out how it is constructing the range that forms the basis of the histogram.

like image 5
Nils Gudat Avatar answered Nov 09 '22 05:11

Nils Gudat