I'm working with scikit learn on a text classification experiment. Now I would like to get the names of the best performing, selected features. I tried some of the answers to similar questions, but nothing works. The last lines of code are an example of what I tried. For example when I print <code>feature_names</code>, I get this error: <code>sklearn.exceptions.NotFittedError: This SelectKBest instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.</code> Any solutions? <pre class="prettyprint"><code>scaler = StandardScaler(with_mean=False) enc = LabelEncoder() y = enc.fit_transform(labels) feat_sel = SelectKBest(mutual_info_classif, k=200) clf = linear_model.LogisticRegression() pipe = Pipeline([('vectorizer', DictVectorizer()), ('scaler', StandardScaler(with_mean=False)), ('mutual_info', feat_sel), ('logistregress', clf)]) feature_names = pipe.named_steps['mutual_info'] X.columns[features.transform(np.arange(len(X.columns)))] </code></pre>

You first have to fit the pipeline and then call <code>feature_names</code>: Solution <pre class="prettyprint"><code>scaler = StandardScaler(with_mean=False) enc = LabelEncoder() y = enc.fit_transform(labels) feat_sel = SelectKBest(mutual_info_classif, k=200) clf = linear_model.LogisticRegression() pipe = Pipeline([('vectorizer', DictVectorizer()), ('scaler', StandardScaler(with_mean=False)), ('mutual_info', feat_sel), ('logistregress', clf)]) # Now fit the pipeline using your data pipe.fit(X, y) #now can the pipe.named_steps feature_names = pipe.named_steps['mutual_info'] X.columns[features.transform(np.arange(len(X.columns)))] </code></pre> General information From the documentation example here you can see the <pre class="prettyprint"><code>anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y) </code></pre> This sets some initial parameters (k parameter for anova and C parameter for svc) and then calls <code>fit(X,y)</code> to fit the pipeline. EDIT: for the new error, since your X is a list of dictionaries I see one way to call the columns method that you want. This can be done using pandas. <pre class="prettyprint"><code>X= [{'age': 10, 'name': 'Tom'}, {'age': 5, 'name': 'Mark'}] df = DataFrame(X) len(df.columns) </code></pre> result: <pre class="prettyprint"><code>2 </code></pre> Hope this helps

feature names from sklearn pipeline: not fitted error

Tags:

python

scikit-learn

names

feature-selection

I'm working with scikit learn on a text classification experiment. Now I would like to get the names of the best performing, selected features. I tried some of the answers to similar questions, but nothing works. The last lines of code are an example of what I tried. For example when I print feature_names, I get this error: sklearn.exceptions.NotFittedError: This SelectKBest instance is not fitted yet. Call 'fit' with appropriate arguments before using this method. Any solutions?

scaler = StandardScaler(with_mean=False) 

enc = LabelEncoder()
y = enc.fit_transform(labels)

feat_sel = SelectKBest(mutual_info_classif, k=200)  
clf = linear_model.LogisticRegression()

pipe = Pipeline([('vectorizer', DictVectorizer()),
                 ('scaler', StandardScaler(with_mean=False)),
                 ('mutual_info', feat_sel),
                 ('logistregress', clf)])

feature_names = pipe.named_steps['mutual_info']
X.columns[features.transform(np.arange(len(X.columns)))]

463

asked Jul 23 '17 16:07

Bambi

1 Answers

You first have to fit the pipeline and then call feature_names:

Solution

scaler = StandardScaler(with_mean=False) 

enc = LabelEncoder()
y = enc.fit_transform(labels)

feat_sel = SelectKBest(mutual_info_classif, k=200)  
clf = linear_model.LogisticRegression()

pipe = Pipeline([('vectorizer', DictVectorizer()),
                 ('scaler', StandardScaler(with_mean=False)),
                 ('mutual_info', feat_sel),
                 ('logistregress', clf)])

# Now fit the pipeline using your data
pipe.fit(X, y)

#now can the pipe.named_steps
feature_names = pipe.named_steps['mutual_info']
X.columns[features.transform(np.arange(len(X.columns)))]

General information

From the documentation example here you can see the

anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)

This sets some initial parameters (k parameter for anova and C parameter for svc)

and then calls fit(X,y) to fit the pipeline.

EDIT:

for the new error, since your X is a list of dictionaries I see one way to call the columns method that you want. This can be done using pandas.

X= [{'age': 10, 'name': 'Tom'}, {'age': 5, 'name': 'Mark'}]

df = DataFrame(X) 
len(df.columns)

result:

Hope this helps

answered Oct 04 '22 22:10

seralouk

Related questions
                            
                                TensorFlow ValueError: Variable does not exist, or was not created with tf.get_variable()
                            
                                Split array into equal sized windows
                            
                                Python/Splinter: How to find and select an option on a site?
                            
                                Which tool can I trust?
                            
                                Tensorflow: is it possible to create 2D LSTM?
                            
                                How do I view data object contents within an npz file?
                            
                                Pandas update Dataframe with Dictionary
                            
                                Implementing Adagrad in Python
                            
                                How can I vectorize a function that uses lagged values of its own output?
                            
                                How to speedup my tensorflow execution on hadoop?
                            
                                Type conversion for namedtuple fields during initialization
                            
                                Python photo mosaic with abstractly shaped mosaics
                            
                                Default values for iterable unpacking
                            
                                Curses.init_color() won't take effect
                            
                                Monitoring system with events in Python
                            
                                Beautiful Soup Can't Find Tags
                            
                                How to run Python script on USB flash-drive insertion
                            
                                Protecting Workbook in openpyxl
                            
                                Slicing Pandas rows with string match slow
                            
                                File (s) not on client

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With