Edit: Alright, so I answered my own question, by reading older questions a bit more. I apologize for asking the question! Using the code
Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)
with the corresponding data instead of the random data worked fine. Just need to optimize the bin size now.
Good day, Now I know that you must be thinking that this has been asked a thousand times. In a way, you are probably right, but I could not find the answer to my specific question from the posts that I found on here, so I figured I might as well just ask. I'll try to be as clear as possible, but please tell me if it is not evident what I want to do
Alright, so I have a (row) vector with 5000 elements, all of which are just integers. Now what I want to do is plot a histogram of these 5000 elements, but in such a way that the y-axis gives the chance of being in that certain bin, while the x-axis is just still regular, as in it gives the value of that specific bin.
Now, what made sense to me was to normalize everything, but that doesn't seem to work, at least how I'm doing it.
My first attempt was
sums = sum(A);
hist(sums/trapz(sums),50)
I omitted the rest because it imports a lot of data from a certain file, which doesn't really matter. sums = sum(A) works fine, and I can see the vector in my matlab thingy. (What should I call it, console?). However, dividing by the area with trapz just changes my x-axis, not my y-axis. Everything gets super small, on the order of 10^-3, while it should be on the order of 10.
Now looking around, someone suggested to use
hist(sums,50)
ylabels = get(gca, 'YTickLabel');
ylabels = linspace(0,1,length(ylabels));
set(gca,'YTickLabel',ylabels);
While this certainly makes the y-axis go from 0 to 1, it is not normalized at all. I want it to actually reflect the chance of being in a certain bin. Combining the two does also not work. I apologize if the answer is very obvious, I just don't see it.
Edit: Although I realize this is a seperate question (that has been asked a million times), but the bin size I just picked by hand until it looked good, as in no bars missing from the histogram. I've seen several different scripts that are supposed to optimize bin size, but none of them seem to make the 'best' looking histogram in every case, sadly :( Is there an easy way to pick the size, if all the numbers are integers?
(Just to close the question)
Histogram is an absolute frequency plot so the sum of all bin frequencies (sum of the output vector of hist function) is always the number of elements in its input vector. So if you want a percentage output all you need to do is dividing each element in the output by that total number:
x = randn(10000, 1);
numOfBins = 100;
[histFreq, histXout] = hist(x, numOfBins);
figure;
bar(histXout, histFreq/sum(histFreq)*100);
xlabel('x');
ylabel('Frequency (percent)');
If you want to reconstruct the probability density function of your data, you need to take into account the bin size of the histogram and divide the frequencies by that:
x = randn(10000, 1);
numOfBins = 100;
[histFreq, histXout] = hist(x, numOfBins);
binWidth = histXout(2)-histXout(1);
figure;
bar(histXout, histFreq/binWidth/sum(histFreq));
xlabel('x');
ylabel('PDF: f(x)');
hold on
% fit a normal dist to check the pdf
PD = fitdist(x, 'normal');
plot(histXout, pdf(PD, histXout), 'r');
Update:
Since MATLAB R2014b, you can use the 'histogram' command to easily produce histograms with various normalizations. For example, the above becomes:
x = randn(10000, 1);
figure;
h = histogram(x, 'normalization', 'pdf');
xlabel('x');
ylabel('PDF: f(x)');
hold on
% fit a normal dist to check the pdf
PD = fitdist(x, 'normal');
plot(h.BinEdges, pdf(PD, h.BinEdges), 'r');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With