I want to compare the performance of two models using the F statistic. Here is a reproducible example and the expected results:
load carbig
tbl = table(Acceleration,Cylinders,Horsepower,MPG);
% Testing separetly both models
mdl1 = fitlm(tbl,'MPG~1+Acceleration+Cylinders+Horsepower');
mdl2 = fitlm(tbl,'MPG~1+Acceleration');
% Comparing both models using the F-test and p-value
numerator = (mdl2.SSE-mdl1.SSE)/(mdl1.NumCoefficients-mdl2.NumCoefficients);
denominator = mdl1.SSE/mdl1.DFE;
F = numerator/denominator;
p = 1-fcdf(F,mdl1.NumCoefficients-mdl2.NumCoefficients,mdl1.DFE);
We end up with F = 298.75 and p = 0, indicating mdl1 is significantly better than mdl2, as assessed by the F statistic.
Is there anyway to obtain the F and p values without performing twice fitlm and doing all the computation?
I tried to run a coefTest, as suggested by @Glen_b, however the function is poorly documented and the results are not the ones I'm expecting.
[p,F] = coefTest(mdl1); % p = 0, F = 262.508 (this F test mdl1 vs constant mdl)
[p,F] = coefTest(mdl1,[0,0,1,1]); % p = 0, F = 57.662 (not sure what this is testing)
[p,F] = coefTest(mdl1,[1,1,0,0]); % p = 0, F = 486.810 (idem)
I believe I should carry the test with a different null hypothesis (C) using the function [p,F] = coeffTest(mdl1,H,C). But I don't really know how to do it and there's no example.
This answer is in regards to comparing two linear regression models where one model is a restricted version of the other.
To do an F-test on the restriction that the 3rd and 4th elements of your estimated, coefficient vector b are zero:
[p, F] = coefTest(mdl1, [0, 0, 1, 0; 0, 0, 0, 1]);
Let b be our estimated vector. Linear restrictions on b are typically written in a matrix form: R*b = r. The restriction that 3rd and 4th element of b are zero would be written:
[0, 0, 1, 0 * b = [0
0, 0, 0, 1] 0];
The matrix [0, 0, 1, 0; 0, 0, 0, 1] is what coefTest calls the H matrix in the docs.
P = coefTest(M,H), with H a numeric matrix having one column for each
coefficient, performs an F test that H*B=0, where B represents the
coefficient vector.
Sometimes with this econometric routines, it's nice just to write it out yourself so you know what's really going on.
Remove rows with NaN because they just add unrelated complexity:
tbl_dirty = table(Acceleration,Cylinders,Horsepower,MPG);
tbl = tbl_dirty(~any(ismissing(tbl_dirty),2),:);
Do the estimation etc...
n = height(tbl); % number of observations
y = tbl.MPG;
X = [ones(n, 1), tbl.Acceleration, tbl.Cylinders, tbl.Horsepower];
k = size(X,2); % number of variables (including constant)
b = X \ y; % estimate b with least squares
u = y - X * b; % calculates residuals
s2 = u' * u / (n - k); % estimate variance of error term (assuming homoskedasticity, independent observations)
BCOV = inv(X'*X) * s2; % get covariance matrix of b assuming homoskedasticity of error term etc...
bse = diag(BCOV).^.5; % standard errors
R = [0, 0, 1, 0;
0, 0, 0, 1];
r = [0; 0]; % Testing restriction: R * b = r
num_restrictions = size(R, 1);
F = (R*b - r)'*inv(R * BCOV * R')*(R*b - r) / num_restrictions; % F-stat (see Hiyashi for reference)
Fp = 1 - fcdf(F, num_restrictions, n - k); % F p-val
For reference, can look at p. 65 of Hiyashi's book Econometrics.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With