Feature selection on a keras model

Tags:

I was trying to find the best features that dominate for the output of my regression model, Following is my code.

seed = 7
np.random.seed(seed)
estimators = []
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=3,
                           batch_size=20)))
pipeline = Pipeline(estimators)
rfe = RFE(estimator= pipeline, n_features_to_select=5)
fit = rfe.fit(X_set, Y_set)

But I get the following runtime error when running.

RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes

How to overcome this issue and select best features for my model? If not, Can I use algorithms like LogisticRegression() provided and supported by RFE in Scikit to achieve the task of finding best features for my dataset?

457

asked May 06 '18 11:05

Klaus

2 Answers

I assume your Keras model is some kind of a neural network. And with NN in general it is kind of hard to see which input features are relevant and which are not. The reason for this is that each input feature has multiple coefficients that are linked to it - each corresponding to one node of the first hidden layer. Adding additional hidden layers makes it even more complicated to determine how big of an impact the input feature has on the final prediction.

On the other hand, for linear models it is very straightforward since each feature x_i has a corresponding weight/coefficient w_i and its magnitude directly determines how big of an impact it has in prediction (assuming that features are scaled of course).

The RFE estimator (Recursive feature elimination) assumes that your prediction model has an attribute coef_ (linear models) or feature_importances_(tree models) that has the length of input features and that it represents their relevance (in absolute terms).

My suggestion:

Feature selection: (Option a) Run the RFE on any linear / tree model to reduce the number of features to some desired number n_features_to_select. (Option b) Use regularized linear models like lasso / elastic net that enforce sparsity. The problem here is that you cannot directly set the actual number of selected features. (Option c) Use any other feature selection technique from here.
Neural Network: Use only features from (1) for your neural network.

140

answered Sep 20 '22 13:09

Jan K

Suggestion:

Perform the RFE algorithm on a sklearn-based algorithm to observe feature importance. Finally, you use the most importantly observed features to train your algorithm based on Keras.

To your question: Standardization is not required for logistic regression

answered Sep 22 '22 13:09

Hendouz

Related questions
                            
                                Creating nested Json structure with multiple key values in Python from Json
                            
                                How to delete recursively empty folders in Python3?
                            
                                django csv file upload managing
                            
                                How to parse empty value of parameter in HTTP request in python?
                            
                                How to create non-blocking continuous reading from `stdin`?
                            
                                Add months to a datetime column in pandas
                            
                                Error with encrypt message with RSA python
                            
                                re.search().TypeError: cannot use a string pattern on a bytes-like object
                            
                                How to do a partial expand in Snakemake?
                            
                                Python 3: setup.py: pip install that does everything (build_ext + install)
                            
                                How to install python3-dev in Oracle Linux?
                            
                                Check on the stdout of a running subprocess in python
                            
                                Pandas groupby with delimiter join
                            
                                Safely unpacking results of str.split [duplicate]
                            
                                python docopt: "expected string or buffer"
                            
                                Numpy's float32 and float comparisons
                            
                                Convert Geo json with nested lists to pandas dataframe
                            
                                Psycopg2 - not all arguments converted during string formatting
                            
                                Understanding async await in python socket io / aiohttp server
                            
                                python while loop range function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Feature selection on a keras model

Tags:

python-3.x

deep-learning

keras

scikit-learn

feature-selection

Klaus

People also ask

2 Answers

Jan K

Hendouz

Recent Activity

Donate For Us