Using partial_fit with Scikit Pipeline

Tags:

scikit-learn

How do you call partial_fit() on a scikit-learn classifier wrapped inside a Pipeline()?

I'm trying to build an incrementally trainable text classifier using SGDClassifier like:

from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier

classifier = Pipeline([
    ('vectorizer', HashingVectorizer(ngram_range=(1,4), non_negative=True)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(SGDClassifier())),
])

but I get an AttributeError trying to call classifier.partial_fit(x,y).

It supports fit(), so I don't see why partial_fit() isn't available. Would it be possible to introspect the pipeline, call the data transformers, and then directly call partial_fit() on my classifier?

967

asked Jul 29 '13 18:07

Cerin

1 Answers

Here is what I'm doing - where 'mapper' and 'clf' are the 2 steps in my Pipeline obj.

def partial_pipe_fit(pipeline_obj, df):
    X = pipeline_obj.named_steps['mapper'].fit_transform(df)
    Y = df['class']
    pipeline_obj.named_steps['clf'].partial_fit(X,Y)

You probably want to keep track of performance as you keep adjusting/updating your classifier - but that is a secondary point

so more specifically - the original pipeline(s) were constructed as follows

to_vect = Pipeline([('vect', CountVectorizer(min_df=2, max_df=.9, ngram_range=(1, 1), max_features = 100)),
                            ('tfidf', TfidfTransformer())])
full_mapper = DataFrameMapper([
            ('norm_text', to_vect),
            ('norm_fname', to_vect), ])

full_pipe = Pipeline([('mapper', full_mapper), ('clf', SGDClassifier(n_iter=15, warm_start=True,
                                                                n_jobs=-1, random_state=self.random_state))])

google DataFrameMapper to learn more about it - but here it just enables a transformation step that plays nice with pandas

112

answered Sep 19 '22 18:09

meyerson

Related questions
                            
                                How to move labels from bottom to top without adding "ticks"
                            
                                Is there an efficient method of checking whether a column has mixed dtypes?
                            
                                How to force Django models to be released from memory
                            
                                Unstack and return value counts for each variable?
                            
                                Why is Django's Meta an old-style class?
                            
                                Using IPython from the Python shell like `code.interact()`
                            
                                Multiple reactors (main loops) in one application through threading (or alternative means)
                            
                                How to run Python nose tests with a different version of Python
                            
                                Django datefield filter by weekday/weekend
                            
                                Linux/Python: encoding a unicode string for print
                            
                                Setting an axis in matplotlib
                            
                                Why did Google choose Java for the Android Operating System? [closed]
                            
                                ORM with Graph-Databases like Neo4j in Python
                            
                                How do I make a trailing slash optional with webapp2?
                            
                                Can't start Windows service written in Python (win32serviceutil)
                            
                                axis limits for scatter plot not holding in matplotlib
                            
                                Best Machine Learning package for Python 3x? [closed]
                            
                                Flask-framework: MVC pattern
                            
                                Django py.test does not find settings module
                            
                                Any good way to programmatically change nginx config file from python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With