Sklearn: Is there any way to debug Pipelines?

Tags:

I have created some pipelines for classification task and I want to check out what information is being present/stored at each stage (e.g. text_stats, ngram_tfidf). How could I do this.

pipeline = Pipeline([
    ('features',FeatureUnion([
                ('text_stats', Pipeline([
                            ('length',TextStats()),
                            ('vect', DictVectorizer())
                        ])),
                ('ngram_tfidf',Pipeline([
                            ('count_vect', CountVectorizer(tokenizer=tokenize_bigram_stem,stop_words=stopwords)),
                            ('tfidf', TfidfTransformer())
                        ]))
            ])),   
    ('classifier',MultinomialNB(alpha=0.1))
])

909

asked Jan 15 '16 00:01

Aman Tandon

Video Answer

1 Answers

I find it at times useful to temporarily add a debugging step that prints out the information you are interested in. Building on top of the example from the sklearn example 1, you could do this to for example to print out the first 5 lines, shape, or whatever you need to look at before the classifier is called:

from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
from sklearn.base import TransformerMixin, BaseEstimator

class Debug(BaseEstimator, TransformerMixin):

    def transform(self, X):
        print(pd.DataFrame(X).head())
        print(X.shape)
        return X

    def fit(self, X, y=None, **fit_params):
        return self

X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
anova_filter = SelectKBest(f_regression, k=5)
clf = svm.SVC(kernel='linear')
anova_svm = Pipeline([('anova', anova_filter), ('dbg', Debug()), ('svc', clf)])
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)

prediction = anova_svm.predict(X)

104

answered Sep 28 '22 06:09

Marcus V.

Related questions
                            
                                keep/slice specific columns in pandas
                            
                                Pickle: dealing with updated class definitions
                            
                                Celery does not release memory
                            
                                In SciPy, what is 'slinear' interpolation?
                            
                                Encode string representation of integer to base64 in Python 3 [duplicate]
                            
                                Sort list of dictionaries by multiple keys with different ordering
                            
                                Assert attribute on mock instance was accessed
                            
                                Calling Python function from Go and getting the function return value
                            
                                How to check if a polygon is empty in Shapely?
                            
                                Removing every nth element in an array
                            
                                In requests library, how can I avoid "HttpConnectionPool is full, discarding connection" warning?
                            
                                Python any() function within a list comprehension
                            
                                How to use bower package manager in Django App?
                            
                                How to get lng lat value from query results of geoalchemy2
                            
                                How to specify another tox project folder as a dependency for a tox project
                            
                                Compute SHA1 of Strings in python
                            
                                How to setup different subdomains in Flask (using blueprints)?
                            
                                How can I change device used of theano
                            
                                Why is globals() a function in Python?
                            
                                What is the difference between Session and db.session in SQLAlchemy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sklearn: Is there any way to debug Pipelines?

Tags:

python

python-2.7

scikit-learn

Aman Tandon

People also ask

Video Answer

1 Answers

Marcus V.

Recent Activity

Donate For Us