I'm optimizing some paramters for an SVC in sklearn, and the biggest issue here is having to wait 30 minutes before I try out any other parameter ranges. Worse is the fact that I'd like to try more values for c and gamma within the same range (so I can create a smoother surface plot) but I know that it will just take longer and longer... When I ran it today I changed the cache_size from 200 to 600 (without really knowing what it does) to see if it made a difference. The time decreased by about a minute.
Is this something I can help? Or am I just gonna have to deal with a very long time?
clf = svm.SVC(kernel="rbf" , probability = True, cache_size = 600)
gamma_range = [1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1]
c_range = [1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5]
param_grid = dict(gamma = gamma_range, C = c_range)
grid = GridSearchCV(clf, param_grid, cv= 10, scoring="accuracy")
%time grid.fit(X_norm, y)
returns:
Wall time: 32min 59s
GridSearchCV(cv=10, error_score='raise',
estimator=SVC(C=1.0, cache_size=600, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=True, random_state=None,
shrinking=True, tol=0.001, verbose=False),
fit_params={}, iid=True, loss_func=None, n_jobs=1,
param_grid={'C': [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0, 100000.0], 'gamma': [1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0]},
pre_dispatch='2*n_jobs', refit=True, score_func=None,
scoring='accuracy', verbose=0)
Only ~7.5k records were used for training with cv=3, and ~3k records for testing purpose. Observing the above time numbers, for parameter grid having 3125 combinations, the Grid Search CV took 10856 seconds (~3 hrs) whereas Halving Grid Search CV took 465 seconds (~8 mins), which is approximate 23x times faster.
This means that all combinations of hyperparameters will be trained using cross-validation. If there are 100 possible candidates and you are doing 5-fold cross-validation, the given model will be trained 500 times (500 iterations). Surely, this will take an excruciatingly long time for heavy models.
The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.
A few things:
cv=3
in the GridSearchCV
call) without any meaningful difference in performance estimation.njobs
to 2+ in your GridSearchCV
call so you run multiple models at once. Depending on the size of your data, you may not be able to increase it too high, and you won't see an improvement increasing it past the number of cores you're running, but you can probably trim a bit of time that way.Also you could set probability=False inside of SVC estimator to avoid applying expensive Platt's calibration internally.
(If having ability to run predict_proba is crucial, perform GridSearchCv with refit=False
, and after picking best paramset in terms of model's quality on test set just retrain best estimator with probability=True on whole training set.)
Another step would be to use RandomizedSearchCV
instead of GridSearchCV
, which would allow you to reach better model quality at roughly the same time (as controlled by n_iters
parameter).
And, as already mentioned, use n_jobs=-1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With