Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalizing a histogram and having the y-axis in percentages in matlab

Edit: Alright, so I answered my own question, by reading older questions a bit more. I apologize for asking the question! Using the code

Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)

with the corresponding data instead of the random data worked fine. Just need to optimize the bin size now.

Good day, Now I know that you must be thinking that this has been asked a thousand times. In a way, you are probably right, but I could not find the answer to my specific question from the posts that I found on here, so I figured I might as well just ask. I'll try to be as clear as possible, but please tell me if it is not evident what I want to do

Alright, so I have a (row) vector with 5000 elements, all of which are just integers. Now what I want to do is plot a histogram of these 5000 elements, but in such a way that the y-axis gives the chance of being in that certain bin, while the x-axis is just still regular, as in it gives the value of that specific bin.

Now, what made sense to me was to normalize everything, but that doesn't seem to work, at least how I'm doing it.

My first attempt was

sums = sum(A);
hist(sums/trapz(sums),50)

I omitted the rest because it imports a lot of data from a certain file, which doesn't really matter. sums = sum(A) works fine, and I can see the vector in my matlab thingy. (What should I call it, console?). However, dividing by the area with trapz just changes my x-axis, not my y-axis. Everything gets super small, on the order of 10^-3, while it should be on the order of 10.

Now looking around, someone suggested to use

hist(sums,50)
ylabels = get(gca, 'YTickLabel');
ylabels = linspace(0,1,length(ylabels));
set(gca,'YTickLabel',ylabels); 

While this certainly makes the y-axis go from 0 to 1, it is not normalized at all. I want it to actually reflect the chance of being in a certain bin. Combining the two does also not work. I apologize if the answer is very obvious, I just don't see it.

Edit: Although I realize this is a seperate question (that has been asked a million times), but the bin size I just picked by hand until it looked good, as in no bars missing from the histogram. I've seen several different scripts that are supposed to optimize bin size, but none of them seem to make the 'best' looking histogram in every case, sadly :( Is there an easy way to pick the size, if all the numbers are integers?

like image 358
user129412 Avatar asked Jan 11 '14 13:01

user129412


1 Answers

(Just to close the question)

Histogram is an absolute frequency plot so the sum of all bin frequencies (sum of the output vector of hist function) is always the number of elements in its input vector. So if you want a percentage output all you need to do is dividing each element in the output by that total number:

x = randn(10000, 1);
numOfBins = 100;
[histFreq, histXout] = hist(x, numOfBins);
figure;
bar(histXout, histFreq/sum(histFreq)*100);
xlabel('x');
ylabel('Frequency (percent)');

enter image description here

If you want to reconstruct the probability density function of your data, you need to take into account the bin size of the histogram and divide the frequencies by that:

x = randn(10000, 1);
numOfBins = 100;
[histFreq, histXout] = hist(x, numOfBins);
binWidth = histXout(2)-histXout(1);
figure;
bar(histXout, histFreq/binWidth/sum(histFreq));       
xlabel('x');
ylabel('PDF: f(x)');
hold on
% fit a normal dist to check the pdf
PD = fitdist(x, 'normal');
plot(histXout, pdf(PD, histXout), 'r');

enter image description here


Update:

Since MATLAB R2014b, you can use the 'histogram' command to easily produce histograms with various normalizations. For example, the above becomes:

x = randn(10000, 1);
figure;
h = histogram(x, 'normalization', 'pdf');
xlabel('x');
ylabel('PDF: f(x)');
hold on
% fit a normal dist to check the pdf
PD = fitdist(x, 'normal');
plot(h.BinEdges, pdf(PD, h.BinEdges), 'r');

enter image description here

like image 73
OmidS Avatar answered Nov 03 '22 00:11

OmidS