retrieve intermediate features from a pipeline in Scikit (Python)

Tags:

I am using a pipeline very similar to the one given in this example :

>>> text_clf = Pipeline([('vect', CountVectorizer()),
...                      ('tfidf', TfidfTransformer()),
...                      ('clf', MultinomialNB()),
... ])

over which I use GridSearchCV to find the best estimators over a parameter grid.

However, I would like to get the column names of my training set with the get_feature_names() method from CountVectorizer(). Is this possible without implementing CountVectorizer() outside the pipeline?

597

asked Oct 12 '15 16:10

Tanguy

2 Answers

Using the get_params() function, you can get access at the various parts of the pipeline and their respective internal parameters. Here's an example of accessing 'vect'

text_clf = Pipeline([('vect', CountVectorizer()),
                     ('tfidf', TfidfTransformer()),
                     ('clf', MultinomialNB())]
print text_clf.get_params()['vect']

yields (for me)

CountVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',
    dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
    lowercase=True, max_df=1.0, max_features=None, min_df=1,
    ngram_range=(1, 1), preprocessor=None, stop_words=None,
    strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b',
    tokenizer=None, vocabulary=None)

I haven't fitted the pipeline to any data in this example, so calling get_feature_names() at this point will return an error.

121

answered Sep 27 '22 19:09

rabbit

just for reference

The estimators of a pipeline are stored as a list in the steps attribute:
>>>

>>> clf.steps[0]
('reduce_dim', PCA(copy=True, n_components=None, whiten=False))

and as a dict in named_steps:
>>>

>>> clf.named_steps['reduce_dim']
PCA(copy=True, n_components=None, whiten=False)

from http://scikit-learn.org/stable/modules/pipeline.html

answered Sep 27 '22 20:09

AbtPst

Related questions
                            
                                How to limit one session from any browser for a username in flask?
                            
                                Python Selenium Chrome disable prompt for "Trying to download multiple files"
                            
                                Quickest way to dedupe list in dict [duplicate]
                            
                                Cython No such file or directory: '.pyd' error on Windows
                            
                                random.sample on Django querysets: How will sampling on querysets affect performance?
                            
                                Why is Flask checking `'\\/' in json.dumps('/')` in its json module?
                            
                                Making an instagram posting bot with python?
                            
                                Combinations of MultiIndex levels which occur in a DataFrame
                            
                                Accessing serializer instances in nested serializer's field
                            
                                Getting the date of the last day of this [week/month/quarter/year]
                            
                                How to use psycopg2 connection string with variables?
                            
                                Assign value to a list using slice notation with assignee [duplicate]
                            
                                Round off floating point values in dict
                            
                                Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1
                            
                                how Python cvxopt solvers qp basically works
                            
                                Is there a python construct that is a dummy function?
                            
                                Plot semi transparent contour plot over image file using matplotlib
                            
                                Comparing first element of the consecutive lists of tuples in Python
                            
                                pandas how to convert all the string value to float
                            
                                Removing first elements of tuples in a list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

retrieve intermediate features from a pipeline in Scikit (Python)

Tags:

python

scikit-learn

pipeline

Tanguy

People also ask

2 Answers

rabbit

AbtPst

Recent Activity

Donate For Us