I run an sk-learn classifier on a pandas dataframe (X). Since some data is missing, I use sk-learn's imputer like this:
imp=Imputer(strategy='mean',axis=0)
X=imp.fit_transform(X)
After doing that however, my number of features is decreased, presumably because the imputer just gets rids of the empty columns.
That's fine, except that the imputer transforms my dataframe into a numpy ndarray, and thus I lose the column/feature names. I need them later on to identify the important features (with clf.feature_importances_).
How can I know the names of the features in clf.feature_importances_, if some of the columns of my initial dataframe have been dropped by the imputer?
you can do this:
invalid_mask = np.isnan(imp.statistics_)
valid_mask = np.logical_not(invalid_mask)
valid_idx, = np.where(valid_mask)
Now you have old indexes (Indexes that these columns had in matrix X) for valid columns. You can get feature names by these indexes from list of feature names of old X.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With