Now base tensorflow-char-rnn I start a word-rnn project to predict the next word. But I found that speed is too slow in my train data set. Here is my training details:
The machine details:
In my test, the time of training data 1 epoch is need 17 days! It’s is really too slow, and then I change the seq2seq.rnn_decoder to tf.nn.dynamic_rnn, but the time is still 17 days.
I want to find the too slow reason is caused by my code or it has always been so slow? Because I heard some rumors that Tensorflow rnn is slower than other DL Framework.
This is my model code:
class SeqModel():
def __init__(self, config, infer=False):
self.args = config
if infer:
config.batch_size = 1
config.seq_length = 1
if config.model == 'rnn':
cell_fn = rnn_cell.BasicRNNCell
elif config.model == 'gru':
cell_fn = rnn_cell.GRUCell
elif config.model == 'lstm':
cell_fn = rnn_cell.BasicLSTMCell
else:
raise Exception("model type not supported: {}".format(config.model))
cell = cell_fn(config.hidden_size)
self.cell = cell = rnn_cell.MultiRNNCell([cell] * config.num_layers)
self.input_data = tf.placeholder(tf.int32, [config.batch_size, config.seq_length])
self.targets = tf.placeholder(tf.int32, [config.batch_size, config.seq_length])
self.initial_state = cell.zero_state(config.batch_size, tf.float32)
with tf.variable_scope('rnnlm'):
softmax_w = tf.get_variable("softmax_w", [config.hidden_size, config.vocab_size])
softmax_b = tf.get_variable("softmax_b", [config.vocab_size])
embedding = tf.get_variable("embedding", [config.vocab_size, config.hidden_size])
inputs = tf.nn.embedding_lookup(embedding, self.input_data)
outputs, last_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=self.initial_state)
# [seq_size * batch_size, hidden_size]
output = tf.reshape(tf.concat(1, outputs), [-1, config.hidden_size])
self.logits = tf.matmul(output, softmax_w) + softmax_b
self.probs = tf.nn.softmax(self.logits)
self.final_state = last_state
loss = seq2seq.sequence_loss_by_example([self.logits],
[tf.reshape(self.targets, [-1])],
[tf.ones([config.batch_size * config.seq_length])],
config.vocab_size)
self.cost = tf.reduce_sum(loss) / config.batch_size / config.seq_length
self.lr = tf.Variable(0.0, trainable=False)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
config.grad_clip)
optimizer = tf.train.AdamOptimizer(self.lr)
self.train_op = optimizer.apply_gradients(zip(grads, tvars))
Here is the GPU load during the training
Thanks very much.
As you mentionned batch_size is really important to tune, it can lead to impressive speedup but check that your perplexity keeps relevant.
Monitoring your GPU activity can you give you hints about potential I/O bottleneck.
Most importantly, using sampled softmax instead of regular softmax is way faster. This would require you to use a [config.vocab_size, config.hidden_size]
weight matrix instead of you [config.hidden_size, config.vocab_size]
. This is definitely the way to go to my point of view.
Hope this helps.
pltrdy
One other possible way you can speed up training, and the possible reason for your lack of utilisation of the GPU, is you are using placeholders. You should be using queues, if using Tensorflow < 1.2, and the tf.contrib.data module otherwise.
https://www.tensorflow.org/programmers_guide/threading_and_queues
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With