Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get the feature names from sklearn TruncatedSVD object?

I have the following code

import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD
df = df = pd.DataFrame(np.random.randn(1000, 25), index=dates, columns=list('ABCDEFGHIJKLMOPQRSTUVWXYZ'))

def reduce(dim):
    svd = sklearn.decomposition.TruncatedSVD(n_components=dim, n_iter=7, random_state=42)
    return svd.fit(df)

fitted = reduce(5)

how do i get the column names from fitted?

like image 276
m.awad Avatar asked Dec 19 '22 06:12

m.awad


1 Answers

In continuation of Mikhail post.

Assume that you already have feature_names from vectorizer.get_feature_names() and after that you have called svd.fit(X)

Now you can also extract sorted best feature names using the following code:

best_fearures = [feature_names[i] for i in svd.components_[0].argsort()[::-1]]

The above code, try to return the arguement of descending sort of svd.components_[0] and find the relative index from feature_names (all of the features) and construct the best_features array. Then you can see for example the 10 best features:

In[21]: best_features[:10]

Out[21]: 
['manag',
 'develop',
 'busi',
 'solut',
 'initi',
 'enterprise',
 'project',
 'program',
 'process',
 'plan']
like image 174
imanzabet Avatar answered Dec 21 '22 11:12

imanzabet