Assume we generate our own training data (by sampling from some diffusion process and computing some quantities of interest on it for example) and that we have our own CUDA routine called generate_data which generates labels in GPU memory for a given set of inputs.
Hence, we are in a special setting where we can generate as many batches of training data as we want in an "online" fashion (at each batch iteration we call that generate_data routine to generate a new batch and discard the old batch).
Since the data is generated on the GPU, is there a way to make TensorFlow (the Python API) directly use it during the training process ? (for example to fill a placeholder) That way, such a pipeline would be efficient.
My understanding is that currently you would need in such a setup to copy your data from GPU to CPU, and then let TensorFlow copy it again from CPU to GPU, which is rather wasteful as unnecessary copies are being performed.
EDIT: if it helps, we can assume that the CUDA routine is implemented using Numba's CUDA JIT compiler.
This is definitely not a complete answer, but hopefully can help.
You can integrate your CUDA routine to TensorFlow by writing a custom op. There is currently no other way in TensorFlow to interact with other CUDA routines.
As for writing a training loop entirely on GPU, we can write the routine on GPU using tf.while_loop
, in a very similar way to this SO question:
i = tf.Variable(0, name='loop_i')
def cond(i):
return i < n
def body(i):
# Building the graph for custom routine and our model
x, ground_truth = CustomCUDARountine(random_seed, ...)
predictions = MyModel(x, ...)
# Defining the optimizer
loss = loss_func(ground_truth, predictions)
optim = tf.train.GradientDescentOptimizer().minimize(loss)
# loop body
return tf.tuple([tf.add(i, 1)], control_inputs=[optim])
loop = tf.while_loop(cond, body, [i])
# Run the loop
tf.get_default_session().run(loop)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With