Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two linear regression models in MATLAB

I want to compare the performance of two models using the F statistic. Here is a reproducible example and the expected results:

load carbig
tbl = table(Acceleration,Cylinders,Horsepower,MPG);

% Testing separetly both models
mdl1 = fitlm(tbl,'MPG~1+Acceleration+Cylinders+Horsepower');
mdl2 = fitlm(tbl,'MPG~1+Acceleration');

% Comparing both models using the F-test and p-value
numerator = (mdl2.SSE-mdl1.SSE)/(mdl1.NumCoefficients-mdl2.NumCoefficients);
denominator = mdl1.SSE/mdl1.DFE;
F = numerator/denominator;
p = 1-fcdf(F,mdl1.NumCoefficients-mdl2.NumCoefficients,mdl1.DFE);

We end up with F = 298.75 and p = 0, indicating mdl1 is significantly better than mdl2, as assessed by the F statistic.

Is there anyway to obtain the F and p values without performing twice fitlm and doing all the computation?

I tried to run a coefTest, as suggested by @Glen_b, however the function is poorly documented and the results are not the ones I'm expecting.

[p,F] = coefTest(mdl1); % p = 0, F = 262.508  (this F test mdl1 vs constant mdl)
[p,F] = coefTest(mdl1,[0,0,1,1]); % p = 0, F = 57.662 (not sure what this is testing)
[p,F] = coefTest(mdl1,[1,1,0,0]); % p = 0, F = 486.810 (idem)

I believe I should carry the test with a different null hypothesis (C) using the function [p,F] = coeffTest(mdl1,H,C). But I don't really know how to do it and there's no example.

like image 389
mat Avatar asked Feb 21 '26 20:02

mat


1 Answers

This answer is in regards to comparing two linear regression models where one model is a restricted version of the other.

Short answer:

To do an F-test on the restriction that the 3rd and 4th elements of your estimated, coefficient vector b are zero:

[p, F] = coefTest(mdl1, [0, 0, 1, 0; 0, 0, 0, 1]);

Further explanation:

Let b be our estimated vector. Linear restrictions on b are typically written in a matrix form: R*b = r. The restriction that 3rd and 4th element of b are zero would be written:

[0, 0, 1, 0    *    b    = [0
 0, 0, 0, 1]                0];

The matrix [0, 0, 1, 0; 0, 0, 0, 1] is what coefTest calls the H matrix in the docs.

P = coefTest(M,H), with H a numeric matrix having one column for each
    coefficient, performs an F test that H*B=0, where B represents the
    coefficient vector.

Long version

Sometimes with this econometric routines, it's nice just to write it out yourself so you know what's really going on.

Remove rows with NaN because they just add unrelated complexity:

tbl_dirty = table(Acceleration,Cylinders,Horsepower,MPG);
tbl = tbl_dirty(~any(ismissing(tbl_dirty),2),:);

Do the estimation etc...

n = height(tbl);  % number of observations
y = tbl.MPG;
X = [ones(n, 1), tbl.Acceleration, tbl.Cylinders, tbl.Horsepower];
k = size(X,2);     % number of variables (including constant)

b = X \ y;                 % estimate b with least squares
u = y - X * b;             % calculates residuals 
s2 = u' * u / (n - k);     % estimate variance of error term (assuming homoskedasticity, independent observations)
BCOV = inv(X'*X) * s2;     % get covariance matrix of b assuming homoskedasticity of error term etc...
bse = diag(BCOV).^.5;      % standard errors

R = [0, 0, 1, 0;
     0, 0, 0, 1];

r = [0; 0];          % Testing restriction: R * b = r 

num_restrictions = size(R, 1);
F = (R*b - r)'*inv(R * BCOV * R')*(R*b - r) / num_restrictions;   % F-stat (see Hiyashi for reference)

Fp = 1 - fcdf(F, num_restrictions, n - k);  % F p-val

For reference, can look at p. 65 of Hiyashi's book Econometrics.

like image 107
Matthew Gunn Avatar answered Feb 25 '26 09:02

Matthew Gunn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!