In the code example below, I can train the model only when NOT using multiprocessing.
My generator is straight from the tensorflow.keras.utils.Sequence description https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
Any idea how to fix the generator to allow multiprocessing?
Running on Win 10, tensorflow 1.13.1, python 3.6.8
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.utils import Sequence
# Generator
class DataGenerator(Sequence):
def __init__(self, dim, batch_size, n_channels):
self.dim = dim
self.batch_size = batch_size
self.n_channels = n_channels
def __len__(self):
return 100
def __getitem__(self, idx):
X = np.random.randn(self.batch_size, self.dim, self.n_channels)
Y = np.random.randn(self.batch_size, self.dim, 1)
return X, Y
dim= 32
batch_size= 64
n_channels= 3
# Generators
training_generator = DataGenerator(dim, batch_size, n_channels)
validation_generator = DataGenerator(dim, batch_size, n_channels)
# Model
model = Sequential()
model.add(layers.GRU(128, return_sequences=True,
batch_input_shape=[None, training_generator.dim, training_generator.n_channels]))
model.add(layers.Dense(1))
model.compile(loss='mse', optimizer='adam')
# This training procedure runs
model.fit_generator(generator=training_generator,
epochs = 2,
steps_per_epoch = 100,
max_queue_size = 32,
validation_data=validation_generator,
validation_steps = 20,
verbose=1)
# This training procedure fails (Only change is that I added the multiprocessing options)
model.fit_generator(generator=training_generator,
epochs = 2,
steps_per_epoch = 100,
max_queue_size = 32,
validation_data=validation_generator,
validation_steps = 20,
verbose=1,
use_multiprocessing=True,
workers=4)
I expected the second fit_generator() call to train the model like the first one. Instead, I get no output, not even an error message.
I tried your code on Ubuntu 18.04.2 LTS machine with python 3.6.8 and tensorflow 1.13.1. It works in both cases as log shown below:
2019-07-13 12:56:17.003119: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
100/100 [==============================] - 3s 27ms/step - loss: 0.9987
100/100 [==============================] - 10s 103ms/step - loss: 0.9973 - val_loss: 0.9987
Epoch 2/2
100/100 [==============================] - 3s 26ms/step - loss: 0.9955
100/100 [==============================] - 8s 83ms/step - loss: 1.0028 - val_loss: 0.9955
Multiprocessing=True ......
Epoch 1/2
100/100 [==============================] - 3s 32ms/step - loss: 0.9952
100/100 [==============================] - 9s 89ms/step - loss: 0.9962 - val_loss: 0.9952
Epoch 2/2
100/100 [==============================] - 3s 28ms/step - loss: 0.9967
100/100 [==============================] - 9s 86ms/step - loss: 0.9968 - val_loss: 0.9967"
My suggestion is to first try with CPU only mode, by putting BOTH the model and the fit_generator code under "with tf.device('/cpu:0'):". If it works, it would be GPU related issue, such as proper driver, tensorflow with GPU support etc. Most likely, the issue was caused by GPU hanging.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With