Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding points along a plot in Matlab

Tags:

math

plot

matlab

I have the following plot and a file of the data which creates that plot. I would like to have Matlab find the following points for me:

  1. [y,x] for peak noted by the 100% line
  2. [x] for where the plot crosses the y=0 line
  3. [x] for where y is 50% and 20% of the peak found in part 1.

Are there any add-on tools or packages which people are aware of which can help me accomplish this? I need to do this for a collection of plots so something reasonably automated would be ideal.

I can certainly do the programming and calculation parts in Matlab, it's just a matter of being able to load in the data file, matching it to a curve or function, and find the various [x,y] co-ordinates.

Data plot

like image 286
Carl Avatar asked Feb 16 '23 21:02

Carl


1 Answers

OK, here goes. As far as I know, there is no baked-in routine in Matlab to do what you desire; you'll have to make one yourself. Couple of things to note:

  • As should be obvious, linearly interpolated data is easiest to do and should pose no problem

  • Using a single polynomial interpolation is also not too hard, although there's a few more details to take care of. Finding the peak should be first order of business, which involves finding roots of the derivative (using roots, for example). When the peak is found, find the roots of the polynomial at all desired levels (0%, 20%, 50%) by offsetting the polynomial by this amount.

  • Using cubic splines (spline) is most complicated of all. The routine outlined above for general polynomials should be repeated for all subintervals where you have a complete cubic, taking into account the possibility that the maxima can also lie on the boundaries of the subinterval, and that any roots and extrema found may lie outside the interval on which the cubic is valid (also don't forget the x-offsets used by spline).

Here is my implementation of all 3 methods:

%% Initialize
% ---------------------------

clc
clear all

% Create some bogus data
n = 25;

f = @(x) cos(x) .* sin(4*x/pi) + 0.5*rand(size(x));
x = sort( 2*pi * rand(n,1));
y = f(x);


%% Linear interpolation
% ---------------------------

% y peak
[y_peak, ind] = max(y);
x_peak = x(ind);

% y == 0%, 20%, 50%
lims = [0 20 50];
X = cell(size(lims));
Y = cell(size(lims));
for p = 1:numel(lims)

    % the current level line to solve for
    lim = y_peak*lims(p)/100;

    % points before and after passing through the current limit
    after    = (circshift(y<lim,1) & y>lim) | (circshift(y>lim,1) & y<lim);
    after(1) = false;
    before   = circshift(after,-1);

    xx = [x(before) x(after)];
    yy = [y(before) y(after)];

    % interpolate and insert new data
    new_X = x(before) - (y(before)-lim).*diff(xx,[],2)./diff(yy,[],2);
    X{p} = new_X;
    Y{p} = lim * ones(size(new_X));

end

% make a plot to verify
figure(1), clf, hold on
plot(x,y, 'r') % (this also plots the interpolation in this case)

plot(x_peak,y_peak, 'k.') % the peak

plot(X{1},Y{1}, 'r.') % the 0%  intersects
plot(X{2},Y{2}, 'g.') % the 20% intersects
plot(X{3},Y{3}, 'b.') % the 50% intersects

% finish plot
xlabel('X'), ylabel('Y'), title('Linear interpolation')
legend(...
    'Real data / interpolation',...
    'peak',...
    '0% intersects',...
    '20% intersects',...
    '50% intersects',...
    'location', 'southeast')



%% Cubic splines
% ---------------------------

% Find cubic splines interpolation
pp = spline(x,y);

% Finding the peak requires finding the maxima of all cubics in all
% intervals. This means evaluating the value of the interpolation on 
% the bounds of each interval, finding the roots of the derivative and
% evaluating the interpolation on those roots: 

coefs = pp.coefs;
derivCoefs = bsxfun(@times, [3 2 1], coefs(:,1:3));
LB = pp.breaks(1:end-1).'; % lower bounds of all intervals
UB = pp.breaks(2:end).';   % upper bounds of all intervals

% rename for clarity
a = derivCoefs(:,1);
b = derivCoefs(:,2);  
c = derivCoefs(:,3); 

% collect and limits x-data
x_extrema = [...
    LB, UB,...     
    LB + (-b + sqrt(b.*b - 4.*a.*c))./2./a,... % NOTE: data is offset by LB
    LB + (-b - sqrt(b.*b - 4.*a.*c))./2./a,... % NOTE: data is offset by LB
    ];

x_extrema = x_extrema(imag(x_extrema) == 0);
x_extrema = x_extrema( x_extrema >= min(x(:)) & x_extrema <= max(x(:)) );

% NOW find the peak
[y_peak, ind] = max(ppval(pp, x_extrema(:)));
x_peak = x_extrema(ind);

% y == 0%, 20% and 50%
lims = [0 20 50];
X = cell(size(lims));
Y = cell(size(lims));    
for p = 1:numel(lims)

    % the current level line to solve for
    lim = y_peak * lims(p)/100;

    % find all 3 roots of all cubics
    R = NaN(size(coefs,1), 3); 
    for ii = 1:size(coefs,1) 

        % offset coefficients to find the right intersects
        C = coefs(ii,:);
        C(end) = C(end)-lim;

        % NOTE: data is offset by LB
        Rr = roots(C) + LB(ii); 

        % prune roots
        Rr( imag(Rr)~=0 ) = NaN;
        Rr( Rr <= LB(ii) | Rr >= UB(ii) ) = NaN;
        % insert results
        R(ii,:) = Rr;
    end

    % now evaluate and save all valid points    
    X{p} = R(~isnan(R));
    Y{p} = ppval(pp, X{p});

end

% as a sanity check, plot everything 
xx = linspace(min(x(:)), max(x(:)), 20*numel(x));
yy = ppval(pp, xx);

figure(2), clf, hold on

plot(x,y, 'r') % the actual data
plot(xx,yy) % the cubic-splines interpolation 

plot(x_peak,y_peak, 'k.') % the peak

plot(X{1},Y{1}, 'r.') % the 0%  intersects
plot(X{2},Y{2}, 'g.') % the 20% intersects
plot(X{3},Y{3}, 'b.') % the 50% intersects

% finish plot
xlabel('X'), ylabel('Y'), title('Cubic splines interpolation')
legend(...
    'Real data',...
    'interpolation',...
    'peak',...
    '0% intersects',...
    '20% intersects',...
    '50% intersects',...
    'location', 'southeast')


%% (N-1)th degree polynomial
% ---------------------------

% Find best interpolating polynomial
coefs = bsxfun(@power, x, n-1:-1:0) \ y;
% (alternatively, you can use polyfit() to do this, but this is faster)

% To find the peak, we'll have to find the roots of the derivative: 
derivCoefs = (n-1:-1:1).' .* coefs(1:end-1);
Rderiv = roots(derivCoefs);
Rderiv = Rderiv(imag(Rderiv) == 0);
Rderiv = Rderiv(Rderiv >= min(x(:)) &  Rderiv <= max(x(:)));

[y_peak, ind] = max(polyval(coefs, Rderiv));
x_peak = Rderiv(ind);

% y == 0%, 20%, 50%
lims = [0 20 50];
X = cell(size(lims));
Y = cell(size(lims));
for p = 1:numel(lims)

    % the current level line to solve for
    lim = y_peak * lims(p)/100;

    % offset coefficients as to find the right intersects
    C = coefs;
    C(end) = C(end)-lim;

    % find and prune roots
    R = roots(C);
    R = R(imag(R) == 0);
    R = R(R>min(x(:)) & R<max(x(:)));

    % evaluate polynomial at these roots to get actual data
    X{p} = R;
    Y{p} = polyval(coefs, R);

end

% as a sanity check, plot everything 
xx = linspace(min(x(:)), max(x(:)), 20*numel(x));
yy = polyval(coefs, xx);

figure(3), clf, hold on

plot(x,y, 'r') % the actual data
plot(xx,yy) % the cubic-splines interpolation 

plot(x_peak,y_peak, 'k.') % the peak

plot(X{1},Y{1}, 'r.') % the 0%  intersects
plot(X{2},Y{2}, 'g.') % the 20% intersects
plot(X{3},Y{3}, 'b.') % the 50% intersects

% finish plot
xlabel('X'), ylabel('Y'), title('(N-1)th degree polynomial')
legend(...
    'Real data',...
    'interpolation',...
    'peak',...
    '0% intersects',...
    '20% intersects',...
    '50% intersects',...
    'location', 'southeast')

This results in these three plots:

Linear interpolationPolynomial interpolationCubic splines interpolation

(Note that something goes wrong in the (N-1)th degree polynomial; the 20% crossings are all wrong towards the end. So, before copy-pasting, check everything a bit more thoroughly :)

As I said before, and as you can plainly see, interpolating with a single polynomial will often introduce a lot of problems if the underlying data is not suited for it. Also, as you can clearly see from these plots, the interpolation method very strongly affects where the intersections will be -- it is of the utmost importance that you have at least some idea of what model underlies your data.

For the general cases, cubic splines is usually the best way to go. However, this is a generic method and will give you (and the readers of your publication) a false perception of accuracy in your data. Use cubic splines to get a first idea of what the intersects are and how they behave, but always come back and revisit your analysis once the real model becomes more clear. Certainly don't publish with cubic splines when that is only used to create smoother, more "visually appealing" curves through your data :)

like image 100
Rody Oldenhuis Avatar answered Feb 19 '23 11:02

Rody Oldenhuis