Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

keras predict is very slow

I am working on a reinforcement learning task and decided to use keras NN model for Q value approximation. The approach is common: after each action the reward is stored in a memory replay array, then I take random sample from it and fit the model with new data state-action => reward+predicted_Q(more details here). In order to do the training the Q value has to be predicted for each item in the training set.

The script is running very slow so I started investigating. Profiling shows that 56,87% of cumulative time is taken by _predict_loop method: enter image description here And it looks strange, cause prediction is just a one-way propagation. Just a one-time multiplication of set of numbers. The model I am using is very simple: 8 inputs, 5 nodes on hidden layer, 1 output.

I have installed and configured CUDA, run few example tests and it shows that GPU is used, also I can see huge load of GPU. When I run my code - there is a message: "Using gpu device 0: GeForce GT 730" but I can see that GPU load is really low(about 10%).

Is it normal for predict function to take so much time? Is there a way to use GPU for this computation?

like image 340
Serhiy Avatar asked Jun 25 '16 08:06

Serhiy


People also ask

How accurate is your keras model after 30 epochs?

I've implemented a model with Keras that reaches a training accuracy of ~90% after 30 epochs. When trying to use model.predict on the training dataset (to understand the results of the predict), I expect the results to be good since the prediction is being done on data that the model has already seen but the results I get are extremely low.

Is there a difference between model and predict in keras?

I know it might be an old issue, but it still exists today. I am using tensorflow-gpu 2.6.0 with keras 2.6.0. These two lines of code have the same output but the predict function spends much much more time than model. The output type of predict is a Python Array, while model is a TensorFlow Array. I don't think there is any actual difference.

Does compile() slow down TF/keras?

.compile () sets up the majority of TF/Keras graph, including losses, metrics, gradients, and partly the optimizer and its weights - which guarantees a notable slowdown. What is unexpected is the extent of slowdown - 10-fold on my own experiment, and for predict (), which doesn't update any weights.

Is PyTorch prediction slow on CPU?

I am getting very very slow performance from pytorch prediction on CPU. I read somewhere pytorch was a little slower on cpu but was not expecting it to be so extreme. Is there a magic formula for using pytorch in CPU? To predict I call model (chunk) on chunks that have 5 images.


1 Answers

It seems the size of your NN is much too small to fully utilize the GPU. Typically GPU is faster than multi-core CPU only when the input/hidden/output layer size is larger than 200~500 (depending on the implementation code).

However the size of your NN is only 8/5/1, which means most of the time is spent on GPU overhead such CUDA kernel launching, PCIe data transfer, etc. In this case, the number of calls is the main factor that determines the training time. To speed up, you probably need to train your model on CPU, and with a programming language such as C/C++ that has much lower overhead.

like image 157
kangshiyin Avatar answered Sep 24 '22 06:09

kangshiyin