grid search over multiple classifiers

Tags:

Is there a better inbuilt way to do grid search and test multiple models in a single pipeline? Of course the parameters of the models would be different, which made is complicated for me to figure this out. Here is what I did:

from sklearn.pipeline import Pipeline from sklearn.ensemble import RandomForestClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC from sklearn.naive_bayes import MultinomialNB from sklearn.grid_search import GridSearchCV   def grid_search():     pipeline1 = Pipeline((     ('clf', RandomForestClassifier()),     ('vec2', TfidfTransformer())     ))      pipeline2 = Pipeline((     ('clf', KNeighborsClassifier()),     ))      pipeline3 = Pipeline((     ('clf', SVC()),     ))      pipeline4 = Pipeline((     ('clf', MultinomialNB()),     ))          parameters1 = {     'clf__n_estimators': [10, 20, 30],     'clf__criterion': ['gini', 'entropy'],     'clf__max_features': [5, 10, 15],     'clf__max_depth': ['auto', 'log2', 'sqrt', None]     }      parameters2 = {     'clf__n_neighbors': [3, 7, 10],     'clf__weights': ['uniform', 'distance']     }      parameters3 = {     'clf__C': [0.01, 0.1, 1.0],     'clf__kernel': ['rbf', 'poly'],     'clf__gamma': [0.01, 0.1, 1.0],      }     parameters4 = {     'clf__alpha': [0.01, 0.1, 1.0]     }      pars = [parameters1, parameters2, parameters3, parameters4]     pips = [pipeline1, pipeline2, pipeline3, pipeline4]          print "starting Gridsearch"     for i in range(len(pars)):         gs = GridSearchCV(pips[i], pars[i], verbose=2, refit=False, n_jobs=-1)         gs = gs.fit(X_train, y_train)         print "finished Gridsearch"         print gs.best_score_

However, this approach is still giving the best model within each classifier, and not comparing between classifiers.

702

asked Apr 13 '14 16:04

Aks

1 Answers

Although the solution from dubek is more straight forward, it does not help with interactions between parameters of pipeline elements that come before the classfier. Therefore, I have written a helper class to deal with it, and can be included in the default Pipeline setting of scikit. A minimal example:

from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCV from sklearn.preprocessing import StandardScaler, MaxAbsScaler from sklearn.svm import LinearSVC from sklearn.ensemble import RandomForestClassifier from sklearn import datasets from pipelinehelper import PipelineHelper  iris = datasets.load_iris() X_iris = iris.data y_iris = iris.target pipe = Pipeline([     ('scaler', PipelineHelper([         ('std', StandardScaler()),         ('max', MaxAbsScaler()),     ])),     ('classifier', PipelineHelper([         ('svm', LinearSVC()),         ('rf', RandomForestClassifier()),     ])), ])  params = {     'scaler__selected_model': pipe.named_steps['scaler'].generate({         'std__with_mean': [True, False],         'std__with_std': [True, False],         'max__copy': [True],  # just for displaying     }),     'classifier__selected_model': pipe.named_steps['classifier'].generate({         'svm__C': [0.1, 1.0],         'rf__n_estimators': [100, 20],     }) } grid = GridSearchCV(pipe, params, scoring='accuracy', verbose=1) grid.fit(X_iris, y_iris) print(grid.best_params_) print(grid.best_score_)

It can also be used for other elements of the pipeline, not just the classifier. Code is on github if anyone wants to check it out.

Edit: I have published this on PyPI if anyone is interested, just install ti using pip install pipelinehelper.

126

answered Sep 23 '22 15:09

bmurauer

Related questions
                            
                                What is the effect of "list=list" in Python modules?
                            
                                On what CPU cores are my Python processes running?
                            
                                IOError: request data read error
                            
                                Setting up setup.py for packaging of a single .py file and a single data file without needing to create any folders
                            
                                Setting variables with exec inside a function
                            
                                What's the best way to distribute python command-line tools?
                            
                                Default sub-command, or handling no sub-command with argparse
                            
                                Python dynamic inheritance: How to choose base class upon instance creation?
                            
                                Difference between frompyfunc and vectorize in numpy
                            
                                LSTM Autoencoder
                            
                                how to reverse the URL of a ViewSet's custom action in django restframework
                            
                                Why is the compiler package discontinued in Python 3?
                            
                                Use pdb.set_trace() in a script that reads stdin via a pipe
                            
                                Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?
                            
                                Break on unhandled exception in pycharm
                            
                                Who runs the callback when using apply_async method of a multiprocessing pool?
                            
                                Python logging configuration file
                            
                                Why is 2 * x * x faster than 2 * ( x * x ) in Python 3.x, for integers?
                            
                                TFIDF for Large Dataset
                            
                                What's the equivalent of Python's Celery project for Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

grid search over multiple classifiers

Tags:

python

scikit-learn

Aks

People also ask

1 Answers

bmurauer

Recent Activity

Donate For Us