Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn: Is there any way to debug Pipelines?

I have created some pipelines for classification task and I want to check out what information is being present/stored at each stage (e.g. text_stats, ngram_tfidf). How could I do this.

pipeline = Pipeline([
    ('features',FeatureUnion([
                ('text_stats', Pipeline([
                            ('length',TextStats()),
                            ('vect', DictVectorizer())
                        ])),
                ('ngram_tfidf',Pipeline([
                            ('count_vect', CountVectorizer(tokenizer=tokenize_bigram_stem,stop_words=stopwords)),
                            ('tfidf', TfidfTransformer())
                        ]))
            ])),   
    ('classifier',MultinomialNB(alpha=0.1))
])
like image 909
Aman Tandon Avatar asked Jan 15 '16 00:01

Aman Tandon


People also ask

How do you debug a pipeline?

The service allows for you to debug a pipeline until you reach a particular activity on the pipeline canvas. Put a breakpoint on the activity until which you want to test, and select Debug. The service ensures that the test runs only until the breakpoint activity on the pipeline canvas.

Can you pickle a Sklearn pipeline?

Train and export your model You can export Pipeline objects using the version of joblib included in scikit-learn or pickle , similarly to how you export scikit-learn estimators.

What are two advantages of using Sklearn pipelines?

They have several key benefits: They make your workflow much easier to read and understand. They enforce the implementation and order of steps in your project. These in turn make your work much more reproducible.

What's the difference between pipeline () and Make_pipeline () from Sklearn library?

The only difference is that make_pipeline generates names for steps automatically.


Video Answer


1 Answers

I find it at times useful to temporarily add a debugging step that prints out the information you are interested in. Building on top of the example from the sklearn example 1, you could do this to for example to print out the first 5 lines, shape, or whatever you need to look at before the classifier is called:

from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
from sklearn.base import TransformerMixin, BaseEstimator

class Debug(BaseEstimator, TransformerMixin):

    def transform(self, X):
        print(pd.DataFrame(X).head())
        print(X.shape)
        return X

    def fit(self, X, y=None, **fit_params):
        return self

X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
anova_filter = SelectKBest(f_regression, k=5)
clf = svm.SVC(kernel='linear')
anova_svm = Pipeline([('anova', anova_filter), ('dbg', Debug()), ('svc', clf)])
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)

prediction = anova_svm.predict(X)
like image 104
Marcus V. Avatar answered Sep 28 '22 06:09

Marcus V.