Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will scikit-learn utilize GPU?

Reading implementation of scikit-learn in TensorFlow: http://learningtensorflow.com/lesson6/ and scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm struggling to decide which implementation to use.

scikit-learn is installed as part of the tensorflow docker container so can use either implementation.

Reason to use scikit-learn :

scikit-learn contains less boilerplate than the tensorflow implementation.

Reason to use tensorflow :

If running on Nvidia GPU the algorithm will be run against in parallel , I'm not sure if scikit-learn will utilize all available GPUs?

Reading https://www.quora.com/What-are-the-main-differences-between-TensorFlow-and-SciKit-Learn

TensorFlow is more low-level; basically, the Lego bricks that help you to implement machine learning algorithms whereas scikit-learn offers you off-the-shelf algorithms, e.g., algorithms for classification such as SVMs, Random Forests, Logistic Regression, and many, many more. TensorFlow shines if you want to implement deep learning algorithms, since it allows you to take advantage of GPUs for more efficient training.

This statement re-enforces my assertion that "scikit-learn contains less boilerplate than the tensorflow implementation" but also suggests scikit-learn will not utilize all available GPUs?

like image 941
blue-sky Avatar asked Jan 10 '17 11:01

blue-sky


People also ask

Does Python utilize GPU?

NVIDIA's CUDA Python provides a driver and runtime API for existing toolkits and libraries to simplify GPU-based accelerated processing. Python is one of the most popular programming languages for science, engineering, data analytics, and deep learning applications.

Can random forest use GPU?

For a random forest with T trees and W workers, each worker builds T/W trees on its 1/Wᵗʰ fraction of locally available data. As little communication is required, random forests can scale efficiently to many GPUs.

Can NumPy use GPU?

NumPy doesn't natively support GPU s. However, there are tools and libraries to run NumPy on GPU s. Numba is a Python compiler that can compile Python code to run on multicore CPUs and CUDA-enabled GPU s. Numba also understands NumPy and generates optimized compiled code.


2 Answers

Tensorflow only uses GPU if it is built against Cuda and CuDNN. By default it does not use GPU, especially if it is running inside Docker, unless you use nvidia-docker and an image with a built-in support.

Scikit-learn is not intended to be used as a deep-learning framework and it does not provide any GPU support.

Why is there no support for deep or reinforcement learning / Will there be support for deep or reinforcement learning in scikit-learn?

Deep learning and reinforcement learning both require a rich vocabulary to define an architecture, with deep learning additionally requiring GPUs for efficient computing. However, neither of these fit within the design constraints of scikit-learn; as a result, deep learning and reinforcement learning are currently out of scope for what scikit-learn seeks to achieve.

Extracted from http://scikit-learn.org/stable/faq.html#why-is-there-no-support-for-deep-or-reinforcement-learning-will-there-be-support-for-deep-or-reinforcement-learning-in-scikit-learn

Will you add GPU support in scikit-learn?

No, or at least not in the near future. The main reason is that GPU support will introduce many software dependencies and introduce platform specific issues. scikit-learn is designed to be easy to install on a wide variety of platforms. Outside of neural networks, GPUs don’t play a large role in machine learning today, and much larger gains in speed can often be achieved by a careful choice of algorithms.

Extracted from http://scikit-learn.org/stable/faq.html#will-you-add-gpu-support

like image 166
Ivan De Paz Centeno Avatar answered Oct 20 '22 13:10

Ivan De Paz Centeno


I'm experimenting with a drop-in solution (h2o4gpu) to take advantage of GPU acceleration in particular for Kmeans:

try this:

from h2o4gpu.solvers import KMeans #from sklearn.cluster import KMeans 

as of now, version 0.3.2 still don't have .inertia_ but I think it's in their TODO list.

EDIT: Haven't tested yet, but scikit-cuda seems to be getting traction.

EDIT: RAPIDS is really the way to go here.

like image 25
martin Avatar answered Oct 20 '22 13:10

martin