After identifying the best parameters using a <code>pipeline</code> and <code>GridSearchCV</code>, how do I <code>pickle</code>/<code>joblib</code> this process to re-use later? I see how to do this when it's a single classifier... <pre class="prettyprint"><code>from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') </code></pre> But how do I save this overall <code>pipeline</code> with the best parameters after performing and completing a <code>gridsearch</code>? I tried: <ul> <li> <code>joblib.dump(grid, 'output.pkl')</code> - But that dumped every gridsearch attempt (many files)</li> <li> <code>joblib.dump(pipeline, 'output.pkl')</code> - But I don't think that contains the best parameters</li> </ul> <hr> <pre class="prettyprint"><code>X_train = df['Keyword'] y_train = df['Ad Group'] pipeline = Pipeline([ ('tfidf', TfidfVectorizer()), ('sgd', SGDClassifier()) ]) parameters = {'tfidf__ngram_range': [(1, 1), (1, 2)], 'tfidf__use_idf': (True, False), 'tfidf__max_df': [0.25, 0.5, 0.75, 1.0], 'tfidf__max_features': [10, 50, 100, 250, 500, 1000, None], 'tfidf__stop_words': ('english', None), 'tfidf__smooth_idf': (True, False), 'tfidf__norm': ('l1', 'l2', None), } grid = GridSearchCV(pipeline, parameters, cv=2, verbose=1) grid.fit(X_train, y_train) #These were the best combination of tuning parameters discovered ##best_params = {'tfidf__max_features': None, 'tfidf__use_idf': False, ## 'tfidf__smooth_idf': False, 'tfidf__ngram_range': (1, 2), ## 'tfidf__max_df': 1.0, 'tfidf__stop_words': 'english', ## 'tfidf__norm': 'l2'} </code></pre>

<pre class="prettyprint"><code>import joblib joblib.dump(grid.best_estimator_, 'filename.pkl') </code></pre> If you want to dump your object into one file - use: <pre class="prettyprint"><code>joblib.dump(grid.best_estimator_, 'filename.pkl', compress = 1) </code></pre>

Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?

Tags:

python

scikit-learn

pipeline

grid-search

After identifying the best parameters using a pipeline and GridSearchCV, how do I pickle/joblib this process to re-use later? I see how to do this when it's a single classifier...

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl')

But how do I save this overall pipeline with the best parameters after performing and completing a gridsearch?

I tried:

joblib.dump(grid, 'output.pkl') - But that dumped every gridsearch attempt (many files)
joblib.dump(pipeline, 'output.pkl') - But I don't think that contains the best parameters

X_train = df['Keyword']
y_train = df['Ad Group']

pipeline = Pipeline([
  ('tfidf', TfidfVectorizer()),
  ('sgd', SGDClassifier())
  ])

parameters = {'tfidf__ngram_range': [(1, 1), (1, 2)],
              'tfidf__use_idf': (True, False),
              'tfidf__max_df': [0.25, 0.5, 0.75, 1.0],
              'tfidf__max_features': [10, 50, 100, 250, 500, 1000, None],
              'tfidf__stop_words': ('english', None),
              'tfidf__smooth_idf': (True, False),
              'tfidf__norm': ('l1', 'l2', None),
              }

grid = GridSearchCV(pipeline, parameters, cv=2, verbose=1)
grid.fit(X_train, y_train)

#These were the best combination of tuning parameters discovered
##best_params = {'tfidf__max_features': None, 'tfidf__use_idf': False,
##               'tfidf__smooth_idf': False, 'tfidf__ngram_range': (1, 2),
##               'tfidf__max_df': 1.0, 'tfidf__stop_words': 'english',
##               'tfidf__norm': 'l2'}

291

asked Dec 07 '15 21:12

Jarad

1 Answers

import joblib
joblib.dump(grid.best_estimator_, 'filename.pkl')

If you want to dump your object into one file - use:

joblib.dump(grid.best_estimator_, 'filename.pkl', compress = 1)

178

answered Oct 13 '22 02:10

Ibraim Ganiev

Related questions
                            
                                Psycopg2, Postgresql, Python: Fastest way to bulk-insert
                            
                                How to read a CSV file from a stream and process each line as it is written?
                            
                                Py_INCREF/DECREF: When
                            
                                a += b not the same as a = a + b [duplicate]
                            
                                How to use the @shared_task decorator for class based tasks
                            
                                Is .data still useful in pytorch?
                            
                                Check if a row in one data frame exist in another data frame
                            
                                Apply function to each row of pandas dataframe to create two new columns
                            
                                ISO to datetime object: 'z' is a bad directive [duplicate]
                            
                                How to parse json file with c-style comments?
                            
                                How do I properly use connection pools in redis?
                            
                                Is there a difference between "brew install" and "pip install"?
                            
                                Why use contextlib.suppress as opposed to try/except with pass?
                            
                                Comprehension for flattening a sequence of sequences? [duplicate]
                            
                                "UnboundLocalError: local variable referenced before assignment" after an if statement
                            
                                Pandas GroupBy.apply method duplicates first group
                            
                                Convert Gregorian (Christian) date to Persian date and vice-versa in Python
                            
                                Calling Java/Scala function from a task
                            
                                Folder naming convention for python projects
                            
                                Is there a unicode-ready substitute I can use for urllib.quote and urllib.unquote in Python 2.6.5?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With