Determine importance of a variable in data analysis

Question

How do we generally determine whether a given variable( feature) in a dataset is important or not for accurately doing the prediction task ?

What all tests should be conducted and used to determine suitability of a variable in prediction ?

Suppose I have 32 features and one of them is 'income', then how should I start analysing its importance. Is there any use in comparing this feature with other features, because in the end its the collection of variables that will help in prediction not these two variables which are compared ...

Maksim Khaitovich · Accepted Answer

Start here (especially para Feature Selection Tutorials and Recipes):

http://machinelearningmastery.com/an-introduction-to-feature-selection/

And there (lists the number of available methods for further googling):

https://en.wikipedia.org/wiki/Feature_selection

Also good article with more general discussion on the issue:

http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf

Also the simplest method is to try to fit a RandomForest or Gradient Boosting Machine on your dataset. These algorithms automatically evaluate the importance of each feature during the fitting, after the classifier or regressor is fit you could access (in scikit-learn) its feature_importances_ property - http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

Determine importance of a variable in data analysis

Tags:

machine-learning

data-analysis

mach

1 Answers

Maksim Khaitovich

Recent Activity

Donate For Us

Determine importance of a variable in data analysis

Tags:

machine-learning

data-analysis

mach

1 Answers

Maksim Khaitovich

Related questions

Recent Activity

Donate For Us