Running the code of linear binary pattern for Adrian. This program runs but gives the following warning:
C:\Python27\lib\site-packages\sklearn\svm\base.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
"the number of iterations.", ConvergenceWarning
I am running python2.7 with opencv3.7, what should I do?
Normally when an optimization algorithm does not converge, it is usually because the problem is not well-conditioned, perhaps due to a poor scaling of the decision variables. There are a few things you can try.
C
, is set appropriately. C
has to be > 0. Typically one would try various values of C
in a logarithmic scale (1e-5, 1e-4, 1e-3, ..., 1, 10, 100, ...) before finetuning it at finer granularity within a particular interval. These days, it probably make more sense to tune parameters using, for e.g., Bayesian Optimization using a package such as Scikit-Optimize.max_iter
to a larger value. The default is 1000. This should be your last resort. If the optimization process does not converge within the first 1000 iterations, having it converge by setting a larger max_iter
typically masks other problems such as those described in 1) and 2). It might even indicate that you have some in appropriate features or strong correlations in the features. Debug those first before taking this easy way out.dual = True
if number of features > number of examples and vice versa. This solves the SVM optimization problem using the dual formulation. Thanks @Nino van Hooff for pointing this out, and @JamesKo for spotting my mistake.Note: One should not ignore this warning.
This warning came about because
Solving the linear SVM is just solving a quadratic optimization problem. The solver is typically an iterative algorithm that keeps a running estimate of the solution (i.e., the weight and bias for the SVM). It stops running when the solution corresponds to an objective value that is optimal for this convex optimization problem, or when it hits the maximum number of iterations set.
If the algorithm does not converge, then the current estimate of the SVM's parameters are not guaranteed to be any good, hence the predictions can also be complete garbage.
Edit
In addition, consider the comment by @Nino van Hooff and @5ervant to use the dual formulation of the SVM. This is especially important if the number of features you have, D, is more than the number of training examples N. This is what the dual formulation of the SVM is particular designed for and helps with the conditioning of the optimization problem. Credit to @5ervant for noticing and pointing this out.
Furthermore, @5ervant also pointed out the possibility of changing the solver, in particular the use of the L-BFGS solver. Credit to him (i.e., upvote his answer, not mine).
I would like to provide a quick rough explanation for those who are interested (I am :)) why this matters in this case. Second-order methods, and in particular approximate second-order method like the L-BFGS solver, will help with ill-conditioned problems because it is approximating the Hessian at each iteration and using it to scale the gradient direction. This allows it to get better convergence rate but possibly at a higher compute cost per iteration. That is, it takes fewer iterations to finish but each iteration will be slower than a typical first-order method like gradient-descent or its variants.
For e.g., a typical first-order method might update the solution at each iteration like
x(k + 1) = x(k) - alpha(k) * gradient(f(x(k)))
where alpha(k), the step size at iteration k, depends on the particular choice of algorithm or learning rate schedule.
A second order method, for e.g., Newton, will have an update equation
x(k + 1) = x(k) - alpha(k) * Hessian(x(k))^(-1) * gradient(f(x(k)))
That is, it uses the information of the local curvature encoded in the Hessian to scale the gradient accordingly. If the problem is ill-conditioned, the gradient will be pointing in less than ideal directions and the inverse Hessian scaling will help correct this.
In particular, L-BFGS mentioned in @5ervant's answer is a way to approximate the inverse of the Hessian as computing it can be an expensive operation.
However, second-order methods might converge much faster (i.e., requires fewer iterations) than first-order methods like the usual gradient-descent based solvers, which as you guys know by now sometimes fail to even converge. This can compensate for the time spent at each iteration.
In summary, if you have a well-conditioned problem, or if you can make it well-conditioned through other means such as using regularization and/or feature scaling and/or making sure you have more examples than features, you probably don't have to use a second-order method. But these days with many models optimizing non-convex problems (e.g., those in DL models), second order methods such as L-BFGS methods plays a different role there and there are evidence to suggest they can sometimes find better solutions compared to first-order methods. But that is another story.
I reached the point that I set, up to max_iter=1200000
on my LinearSVC
classifier, but still the "ConvergenceWarning" was still present. I fix the issue by just setting dual=False
and leaving max_iter
to its default.
With LogisticRegression(solver='lbfgs')
classifier, you should increase max_iter
. Mine have reached max_iter=7600
before the "ConvergenceWarning" disappears when training with large dataset's features.
Explicitly specifying the max_iter
resolves the warning as the default max_iter
is 100. [For Logistic Regression].
logreg = LogisticRegression(max_iter=1000)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With