Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting progress updates from GridSearchCV with scikit-learn

I am currently implementing a Support Vector Regression in Python, where I am estimating the parameters C and gamma through the GridSearchCV. I am initially searching from approximately 400 combinations of C and gamma. This is a very exhaustive search which has now been running for over an hour on my computer.

What I would like is to receive status updates e.g. how many combinations are left to test or similar, since at the moment it is hard to tell if the program is working or if it has just frozen or whatever is going on.

From what I have read on the sci-kit learn documentation I cannot seem to find any help with this. Is there a wrap-around?

like image 685
No_Socks Avatar asked Apr 13 '16 09:04

No_Socks


People also ask

How do I see GridSearchCV results?

After fitting the models (note that we call fit on the GridSearchCV instead of the estimator itself) we can get the results using the sklearn. grid_search. GridSearchCV. cv_results_ attribute.

What do I do after GridSearchCV?

Once the GridSearchCV found the values for the hyperparameters, we use the tuned parameters values to build a new model using the training set. With the Testing set, we can now evaluate our new model.

How much time does Gridsearch CV take?

This may need extra memory as per documentation if the dataset is big and you may have to use pre_dispatch parameter. I have 3 parameters with 10 levels to scan and the time for a run is about 19 seconds. Hence, 10*3*19=570/60=~10 minutes.

How does Sklearn GridSearchCV work?

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.


1 Answers

GridSearchCV has a verbose= keyword. Try setting it to e.g. 100.

If you are using sklearn.cross_validation.cross_val_score to evaluate your model, you can also set its verbose= to a high level.

If you need more detail, there is also the possibility to "hack" the scoring object you want to use to make it print the score to the screen or to a file every time it is called, for example:

from sklearn.metrics.scorer import accuracy_scorer

def my_accuracy_scorer(*args):
    score = accuracy_scorer(*args)
    print('score is {}'.format(score))
    return score

Use this function as the scoring keyword in cross_val_score or GridSearchCV, by putting scoring=my_accuracy_scorer

like image 79
eickenberg Avatar answered Sep 23 '22 01:09

eickenberg