Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB: combining and normalizing histograms with different sample sizes

I have four sets of data, the distribution of which I would like to represent in MATLAB in one figure. Current code is:

[n1,x1]=hist([dataset1{:}]);
[n2,x2]=hist([dataset2{:}]);
[n3,x3]=hist([dataset3{:}]);
[n4,x4]=hist([dataset4{:}]);
bar(x1,n1,'hist'); 
hold on; h1=bar(x1,n1,'hist'); set(h1,'facecolor','g')
hold on; h2=bar(x2,n2,'hist'); set(h2,'facecolor','g')
hold on; h3=bar(x3,n3,'hist'); set(h3,'facecolor','g')
hold on; h4=bar(x4,n4,'hist'); set(h4,'facecolor','g')
hold off 

My issue is that I have different sampling sizes for each group, dataset1 has an n of 69, dataset2 has an n of 23, dataset3 and dataset4 have n's of 10. So how do I normalize the distributions when representing these three groups together?

Is there some way to..for example..divide the instances in each bin by the sampling for that group?

like image 973
user3470496 Avatar asked Oct 30 '22 12:10

user3470496


1 Answers

You can normalize your histograms by dividing by the total number of elements:

[n1,x1] = histcounts(randn(69,1));
[n2,x2] = histcounts(randn(23,1));
[n3,x3] = histcounts(randn(10,1));
[n4,x4] = histcounts(randn(10,1));
hold on
bar(x4(1:end-1),n4./sum(n4),'histc');
bar(x3(1:end-1),n3./sum(n3),'histc');
bar(x2(1:end-1),n2./sum(n2),'histc');
bar(x1(1:end-1),n1./sum(n1),'histc');
hold off 
ax = gca;
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))

However, as you can see above, you can do some more things to make your code more simple and short:

  1. You only need to hold on once.
  2. Instead of collecting all the bar handles, use the axes handle.
  3. Plot the bar in ascending order of the number of elements in the dataset, so all histograms will be clearly visible.
  4. With the axes handle set all properties at one command.

and as a side note - it's better to use histcounts.

Here is the result:

only hist


EDIT:

If you want to also plot the pdf line from histfit, then you can save it first, and then plot it normalized:

dataset = {randn(69,1),randn(23,1),randn(10,1),randn(10,1)};
fits = zeros(100,2,numel(dataset));
hold on
for k = numel(dataset):-1:1
    total = numel(dataset{k}); % for normalizing
    f = histfit(dataset{k}); % draw the histogram and fit
    % collect the curve data and normalize it:
    fits(:,:,k) = [f(2).XData; f(2).YData./total].';
    x = f(1).XData; % collect the bar positions
    n = f(1).YData; % collect the bar counts
    f.delete % delete the histogram and the fit
    bar(x,n./total,'histc'); % plot the bar
end
ax = gca; % get the axis handle
% set all color and transparency for the bars:
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))
% plot all the curves:
plot(squeeze(fits(:,1,:)),squeeze(fits(:,2,:)),'LineWidth',3)
hold off

Again, there are some other improvements you can introduce to your code:

  1. Put everything in a loop to make thigs more easily changed later.
  2. Collect all the curves data to one variable so you can plot them all together very easily.

The new result is:

hist & fit

like image 165
EBH Avatar answered Nov 15 '22 08:11

EBH