I'm using Keras with tensorflow as backend. I have one compiled/trained model.
My prediction loop is slow so I would like to find a way to parallelize the predict_proba
calls to speed things up.
I would like to take a list of batches (of data) and then per available gpu, run model.predict_proba()
over a subset of those batches.
Essentially:
data = [ batch_0, batch_1, ... , batch_N ]
on gpu_0 => return predict_proba(batch_0)
on gpu_1 => return predict_proba(batch_1)
...
on gpu_N => return predict_proba(batch_N)
I know that it's possible in pure Tensorflow to assign ops to a given gpu (https://www.tensorflow.org/tutorials/using_gpu). However, I don't know how this translates to my situation since I've built/compiled/trained my model using Keras' api.
I had thought that maybe I just needed to use python's multiprocessing module and start a process per gpu that would run predict_proba(batch_n)
. I know this is theoretically possible given another SO post of mine: Keras + Tensorflow and Multiprocessing in Python. However, this still leaves me with the dilemma of not knowing how to actually "choose" a gpu to operate the process on.
My question boils down to: how does one parallelize prediction for one model in Keras across multiple gpus when using Tensorflow as Keras' backend?
Additionally I am curious if similar parallelization for prediction is possible with only one gpu.
A high level description or code example would be greatly appreciated!
Thanks!
Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes.
Keras is a deep learning API that is based on the TensorFlow platform. It was designed to allow fast experimentation and easy model building with multiple graphical processing units (GPUs).
keras models will transparently run on a single GPU with no code changes required.
Once multiple GPUs are added to your systems, you need to build parallelism into your deep learning processes. There are two main methods to add parallelism—models and data. Model parallelism is a method you can use when your parameters are too large for your memory constraints.
I created one simple example to show how to run keras model across multiple gpus. Basically, multiple processes are created and each of process owns a gpu. To specify the gpu id in process, setting env variable CUDA_VISIBLE_DEVICES is a very straightforward way (os.environ["CUDA_VISIBLE_DEVICES"]). Hope this git repo can help you.
https://github.com/yuanyuanli85/Keras-Multiple-Process-Prediction
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With