Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I do a F-test in python

How do I do an F-test to check if the variance is equivalent in two vectors in Python?

For example if I have

a = [1,2,1,2,1,2,1,2,1,2] b = [1,3,-1,2,1,5,-1,6,-1,2] 

is there something similar to

scipy.stats.ttest_ind(a, b) 

I found

sp.stats.f(a, b) 

But it appears to be something different to an F-test

like image 415
DrewH Avatar asked Feb 01 '14 04:02

DrewH


People also ask

What is F-test Python?

An F-test is used to test whether two population variances are equal. The null and alternative hypotheses for the test are as follows: H0: σ12 = σ22 (the population variances are equal) H1: σ12 ≠ σ22 (the population variances are not equal) This tutorial explains how to perform an F-test in Python.


2 Answers

The test statistic F test for equal variances is simply:

F = Var(X) / Var(Y) 

Where F is distributed as df1 = len(X) - 1, df2 = len(Y) - 1

scipy.stats.f which you mentioned in your question has a CDF method. This means you can generate a p-value for the given statistic and test whether that p-value is greater than your chosen alpha level.

Thus:

alpha = 0.05 #Or whatever you want your alpha to be. p_value = scipy.stats.f.cdf(F, df1, df2) if p_value > alpha:     # Reject the null hypothesis that Var(X) == Var(Y) 

Note that the F-test is extremely sensitive to non-normality of X and Y, so you're probably better off doing a more robust test such as Levene's test or Bartlett's test unless you're reasonably sure that X and Y are distributed normally. These tests can be found in the scipy api:

  • Bartlett's test
  • Levene's test
like image 153
Joel Cornett Avatar answered Oct 14 '22 10:10

Joel Cornett


For anyone who came here searching for an ANOVA F-test or to compare between models for feature selection

  • sklearn.feature_selection.f_classif does ANOVA tests, and
  • sklearn.feature_selection.f_regression does sequential testing of regressions
like image 34
slushy Avatar answered Oct 14 '22 10:10

slushy