Is there a way in MATLAB to check whether the histogram distribution is unimodal or bimodal? EDIT Do you think Hartigan's Dip Statistic would work? I tried passing an image to it, and get the value <code>0</code>. What does that mean? And, when passing an image, does it test the distribution of the histogram of the image on the gray levels? Thanks.

There are many different ways to do what you are asking. In the most literal sense, "bimodal" means there are two peaks. Usually though, you want the "two peaks" to be separated by some reasonable distance, and you want them to each contain a reasonable proportion of the total counts. Only you know what is "reasonable" for your situation, but the following approach might help. <ol> <li>Create a histogram of the intensities</li> <li>Form the cumulative distribution with <code>cumsum</code> </li> <li>For different values of the "cut" between distributions (25%, 30%, 50%, …), compute the mean and standard deviation of the two distributions (above and below the cut).</li> <li>Compute the distance between the means divided by the sum of the standard deviations of the two distributions</li> <li>That quantity will be a maximum at the "best cut"</li> </ol> You have to decide what size of that quantity represents "bimodal" for you. Here is some code that demonstrates what I am talking about. It generates bimodal distributions of different degrees of severity - two Gaussians, with increasing delta between them (steps = size of standard deviation). I compute the quantity described above, and plot it for a range of different values of <code>delta</code>. I then fit a parabola through this curve over a range corresponding to +- 1 sigma of the entire distribution. As you can see, when the distribution becomes more bimodal, two things happen: <ol> <li>The curvature of this curve flips (it goes from a valley to a peak)</li> <li>The maximum increases (it is about 1.33 for a Gaussian).</li> </ol> You can look at these quantities for some of your own distributions, and decide where you want to put the cutoff. <pre class="prettyprint"><code>% test for bimodal distribution close all for delta = 0:10:50 a1 = randn(100,100) * 10 + 25; a2 = randn(100,100) * 10 + 25 + delta; a3 = [a1(:); a2(:)]; [h hb] = hist(a3, 0:100); cs = cumsum(h); llimi = find(cs < 0.2 * max(cs(:))); ulimi = find(cs > 0.8 * max(cs(:))); llim = hb(llimi(end)); ulim = hb(ulimi(1)); cuts = linspace(llim, ulim, 20); dmean = mean(a3); dstd = std(a3); for ci = 1:numel(cuts) d1 = a3(a3<cuts(ci)); d2 = a3(a3>=cuts(ci)); m(ci,1) = mean(d1); m(ci, 2) = mean(d2); s(ci, 1) = std(d1); s(ci, 2) = std(d2); end q = (m(:, 2) - m(:, 1)) ./ sum(s, 2); figure; plot(cuts, q); title(sprintf('delta = %d', delta)) % compute curvature of plot around mean: xlims = dmean + [-1 1] * dstd; indx = find(cuts < xlims(2) && cuts > xlims(1)); pf = polyfit(cuts(indx), q(indx), 2); m = polyval(pf, dmean); fprintf(1, 'coefficients: a = %.2e, peak = %.2f\n', pf(1), m); end </code></pre> Output values: <pre class="prettyprint"><code>coefficients: a = 1.37e-03, peak = 1.32 coefficients: a = 1.01e-03, peak = 1.34 coefficients: a = 2.85e-04, peak = 1.45 coefficients: a = -5.78e-04, peak = 1.70 coefficients: a = -1.29e-03, peak = 2.08 coefficients: a = -1.58e-03, peak = 2.48 </code></pre> Sample plots: <img src="https://i.stack.imgur.com/OHWLm.png" alt="delta = 0"> <img src="https://i.stack.imgur.com/ppzZU.png" alt="delta = 4 sigma"> And the histogram for delta = 40: <img src="https://i.stack.imgur.com/1TsDD.png" alt="enter image description here">

Testing for Unimodal (Unimodality) or Bimodal (Bimodality) Distribution in MATLAB

2 Answers

Here is a script using Nic Price's implementation of Hartigan's Dip Test to identify unimodal distributions. The tricky point was to calculate xpdf, which is not probability density function, but rather a sorted sample.

p_value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. In this case null hypothesis is that distribution is unimodal.

close all; clear all;

function [x2, n, b] = compute_xpdf(x)
  x2 = reshape(x, 1, prod(size(x)));
  [n, b] = hist(x2, 40);
  % This is definitely not probability density function
  x2 = sort(x2);
  % downsampling to speed up computations
  x2 = interp1 (1:length(x2), x2, 1:1000:length(x2));
end

nboot = 500;
sample_size = [256 256];

% Unimodal
sample2d = normrnd(0.0, 10.0, sample_size);

[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot); 

figure;
subplot(1,2,1);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))

% Bimodal
sample2d = sign(sample2d) .* (abs(sample2d) .^ 0.5);

[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot); 

subplot(1,2,2);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))

print -dpng modality.png

Result of script execution

answered Oct 05 '22 14:10

divanov

There are many different ways to do what you are asking. In the most literal sense, "bimodal" means there are two peaks. Usually though, you want the "two peaks" to be separated by some reasonable distance, and you want them to each contain a reasonable proportion of the total counts. Only you know what is "reasonable" for your situation, but the following approach might help.

Create a histogram of the intensities
Form the cumulative distribution with cumsum
For different values of the "cut" between distributions (25%, 30%, 50%, …), compute the mean and standard deviation of the two distributions (above and below the cut).
Compute the distance between the means divided by the sum of the standard deviations of the two distributions
That quantity will be a maximum at the "best cut"

You have to decide what size of that quantity represents "bimodal" for you. Here is some code that demonstrates what I am talking about. It generates bimodal distributions of different degrees of severity - two Gaussians, with increasing delta between them (steps = size of standard deviation). I compute the quantity described above, and plot it for a range of different values of delta. I then fit a parabola through this curve over a range corresponding to +- 1 sigma of the entire distribution. As you can see, when the distribution becomes more bimodal, two things happen:

The curvature of this curve flips (it goes from a valley to a peak)
The maximum increases (it is about 1.33 for a Gaussian).

You can look at these quantities for some of your own distributions, and decide where you want to put the cutoff.

% test for bimodal distribution
close all
for delta = 0:10:50
    a1 = randn(100,100) * 10 + 25;
    a2 = randn(100,100) * 10 + 25 + delta;
    a3 = [a1(:); a2(:)];
    [h hb] = hist(a3, 0:100);
    cs = cumsum(h);
    llimi = find(cs < 0.2 * max(cs(:)));
    ulimi = find(cs > 0.8 * max(cs(:)));
    llim = hb(llimi(end));
    ulim = hb(ulimi(1));
    cuts = linspace(llim, ulim, 20);
    dmean = mean(a3);
    dstd = std(a3);
    for ci = 1:numel(cuts)
        d1 = a3(a3<cuts(ci));
        d2 = a3(a3>=cuts(ci));
        m(ci,1) = mean(d1);
        m(ci, 2) = mean(d2);
        s(ci, 1) = std(d1);
        s(ci, 2) = std(d2);
    end
    q = (m(:, 2) - m(:, 1)) ./ sum(s, 2);
    figure; 
    plot(cuts, q);
    title(sprintf('delta = %d', delta))
    % compute curvature of plot around mean:
    xlims = dmean + [-1 1] * dstd;
    indx = find(cuts < xlims(2) && cuts > xlims(1));
    pf = polyfit(cuts(indx), q(indx), 2);
    m = polyval(pf, dmean);
    fprintf(1, 'coefficients: a = %.2e, peak = %.2f\n', pf(1), m);
end

Output values:

coefficients: a = 1.37e-03, peak = 1.32
coefficients: a = 1.01e-03, peak = 1.34
coefficients: a = 2.85e-04, peak = 1.45
coefficients: a = -5.78e-04, peak = 1.70
coefficients: a = -1.29e-03, peak = 2.08
coefficients: a = -1.58e-03, peak = 2.48

Sample plots:

delta = 0

delta = 4 sigma

And the histogram for delta = 40:

enter image description here

answered Oct 05 '22 15:10

Floris

Related questions
                            
                                What is the reason behind the difference in Results?
                            
                                Refactoring in MATLAB
                            
                                how to delete the diagonal elements of a matrix in MATLAB?
                            
                                How to apply regexp to the cell array in Matlab?
                            
                                Simplest/efficient way of checking if a square falls inside a triangle
                            
                                How to vectorize for loop with custom index
                            
                                Calculating the coordinates for the center of a circle in an image
                            
                                "Desort" a vector (undo a sorting)
                            
                                matlab and arrays [closed]
                            
                                Suppress Escape Characters in Matlab
                            
                                subtracting two matrices in matlab, the negative values in result are substituted by zero
                            
                                Passing additional arguments through function handle in Matlab
                            
                                How do I Combine two equal sized vectors element wise in MatLab?
                            
                                How do I add a new toolbox to my already installed Matlab version?
                            
                                Using standard io stream:stdin and stdout in a matlab exe
                            
                                Using a colon for indexing in matrices of unknown dimensions
                            
                                Estimating confidence intervals of a Markov transition matrix
                            
                                How to find a unique (non-repeated) value in a matrix by using matlab
                            
                                Is it possible to test a function handle without try block?
                            
                                Multi variable gradient descent in matlab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Testing for Unimodal (Unimodality) or Bimodal (Bimodality) Distribution in MATLAB

Tags:

statistics

matlab

histogram

distribution

Simplicity

People also ask

2 Answers

divanov

Floris

Recent Activity

Donate For Us