Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do these F scores mean ? Using SelectKBest feature

I am new to Statistics.I am trying to select the best features to do classification on my data set and I chose to do so by running SelectKbest from scikitlearn.

Here is my code :

 import sklearn.feature_selection as fs
 kb = fs.SelectKBest(k=10)
 kb.fit(X, y)
 names = X.columns.values[kb.get_support()]
 scores = kb.scores_[kb.get_support()]
 names_scores = list(zip(names, scores))
 ns_df = pd.DataFrame(data = names_scores, columns=
  ['Feat_names','F_Scores'])
 ns_df_sorted = ns_df.sort_values(['F_Scores','Feat_names'], ascending =
  [False, True])
 print(ns_df_sorted)

This gives an output like this

  Feat_names   F_Scores
4     go_out  29.870218
8     fun1_2  27.374212
6     fun1_1  26.470766
3       date  25.035227
7    shar1_1  17.629153
2    imprace  11.331197
0      order  11.290014
5    sinc1_1   8.309805
9    shar1_2   5.009775
1   field_cd   4.515538

I am not sure what the F score here signifies and what I can interpret from it.

like image 633
Faliha Zikra Avatar asked Dec 17 '25 18:12

Faliha Zikra


1 Answers

You can understand the F-Scores as a measure of how informative each feature is for your dataset.

As it is explained in the method documentation, an F-test is carried out to assess each feature. The F-scores are the test statistic for the F-test, and they basically represent the ratio between the explained and the unexplained variance.

So, in your example, after using the feature selection method you could either take all the k=10 most informative features or you could use the scores to refine more your selection (e.g. choosing only those for which the F-score is higher than some threshold).

like image 132
carrdelling Avatar answered Dec 19 '25 13:12

carrdelling



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!