matlab: optimum amount of points for linear fit

Tags:

I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30). I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it? I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 (http://www.mathworks.com/matlabcentral/fileexchange/272-polyfit0-m) . Thanks for any suggestions!

EDIT: sample data:

intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];

linear fit

506

asked Nov 12 '12 05:11

Art

1 Answers

What you have here is a rather difficult problem to find a general solution of.

One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:

slopes = diff(y)./diff(x);  
intersepts = y(1:end-1) - slopes.*x(1:end-1);

idx = kmeans(intersepts, 3);

x([idx; 3] == 2)  % the points with the intersepts closest to the linear one.

This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.

Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.

I tried a more systematic and robust approach of the above:

function test

    %% example data
    slope = 2;
    intercept = 1.5;

    x = linspace(0.1, 5, 100).';

    y         = slope*x + intercept;
    y(1:12)   = log(x(1:12)) + y(12)-log(x(12));
    y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;

    y = y + 0.2*randn(size(y));


    %% simple algorithm

    [X,fn] = fminsearch(@(ii)P(ii, x,y,intercept), [0.5 0.5])

    [~,inds] = P(X, y,x,intercept)

end

function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1. 

    N = numel(x);
    n = round(N/2);  

    ii = round(ii*n);

    inds = min(max(1, n+(-ii(1):ii(2))), N);

    % Solve linear system with fixed intercept
    A = x(inds);
    b = y(inds) - intercept;

    % and return the sum of squared errors, divided by 
    % the number of points included in the set. This 
    % last step is required to prevent fminsearch from
    % reducing the set to 1 point (= minimum possible 
    % squared error). 
    C = sum(((A\b)*A - b).^2)/numel(inds);    

end

which only finds a rough approximation to the desired indices (12 and 74 in this example).

When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.

If you have the statistics toolbox, use the kmeans option.

answered Sep 25 '22 23:09

Rody Oldenhuis

Related questions
                            
                                Importance of Algorithms in context of Mobile Application Development? [closed]
                            
                                Algorithm to see if regex repeat is reducible
                            
                                How to merge two finite state automata?
                            
                                Algorithm to find rectangles
                            
                                Minimizing distance to a weighted grid
                            
                                Solve the word game Ghost (as seen on xkcd) - spelling letters without making a word
                            
                                Adjusting the threshold in Canny edge algorithm
                            
                                Merging sequence of symbols
                            
                                finding saddle points in 3d heightmap
                            
                                Finding the minimum unique number in an array
                            
                                what is meant by symmetric DDA?
                            
                                What is this pattern/algo called? Getting a random order of subscribers to an event that only one can react to at a time
                            
                                Longest Common Palindromic Subsequence
                            
                                Discrete fluid "filling" algorithm for a height map
                            
                                Algorithm to interleave array of characters and digits in-place
                            
                                Print all the files in a given folder and sub-folders without using recursion/stack
                            
                                Find a supplement to a subarray of ints in Java
                            
                                Sort in ascending or descending order (chosen arbitrarily; Prefer whichever is cheaper)
                            
                                Shortest path to transform one word into another
                            
                                What are strongly connected components used for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

matlab: optimum amount of points for linear fit

Tags:

algorithm

matlab

linear-regression

Art

People also ask

1 Answers

Rody Oldenhuis

Recent Activity

Donate For Us