Has anyone succeeded in speeding up scikit-learn models using numba and jit compilaition. The specific models I am looking at are regression models such as Logistic Regressions.
I am able to use numba to optimize the functions I write using sklearn models, but the model functions themselves are not affected by this and are not optimized, thus not providing a notable increase in speed. Is there are way to optimize the sklearn functions?
Any info about this would be much appreciated.
Numba does not support Scikitlearn, PyData, and some other Python packages. 5. Works best with Numpy arrays than Python lists. But this is not much of a limitation, since most of our Data science computation is done with NumPy arrays.
Large dataFor larger input data, Numba version of function is must faster than Numpy version, even taking into account of the compiling time. In fact, the ratio of the Numpy and Numba run time will depends on both datasize, and the number of loops, or more general the nature of the function (to be compiled).
Yes, the numba function is fastest for small arrays, however the NumPy solution will be slightly faster for longer arrays. The Python solutions are slower but the "faster" alternative is already significantly faster than your original proposed solution.
Scikit-learn makes heavy use of numpy, most of which is written in C and already compiled (hence not eligible for JIT optimization).
Further, the LogisticRegression model is essentially LinearSVC with the appropriate loss function. I could be slightly wrong about that, but in any case, it uses LIBLINEAR to do the solving, which is again a compiled C library.
The makers of scikit-learn also make heavy use of one of the python-to-compiled systems, Pyrex I think, which again results in optimized machine compiled code ineligible for JIT compilation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With