Measuring time it takes to move data from RAM to GPU memory in Tensorflow

Tags:

I would like to perform the following simple experiment.

I am using Tensorflow. I have a large array (5000x5000 float32 elements). How do I measure how long it actually takes to move this array from RAM to GPU memory?

I understand that I could create some very simple computational graph, run it and measure how long it took. There are two problems with this though. First, I am worried that the time measured will be dominated by the time taken by the computation and not by moving the data from RAM do GPU. Second, if the computation doesn't involve the big array I mentioned, Tensorflow will simplify the computational graph such that the big array won't be in it and it won't get moved from RAM to GPU at all.

800

asked Apr 04 '18 15:04

Deeplearningmaniac

1 Answers

The solution is to make a simple benchmark where transfer of memory dominates. To check that TensorFlow doesn't optimize your transfer away, you can add a tiny operation on the result. Overhead of tiny operation like fill should be a couple of microseconds, which is insignificant compared to loading 100MB into GPU, which is >5 milliseconds.

def feed_gpu_tensor():
  params0 = create_array()
  with tf.device('/gpu:0'):
    params = tf.placeholder(tf.float32)
    result = tf.concat([params, tf.fill([1],1.0)], axis=0)
  for i in range(args.num_iters):
    with timeit('feed_gpu_tensor'):
      sess.run(result.op, feed_dict = {params: params0})

To run this benchmark, you can do this

wget https://github.com/diux-dev/cluster/blob/master/yuxin_numpy/tf_numpy_benchmark.py
python tf_numpy_benchmark.py --benchmark=feed_gpu_tensor

I found that on p3.16xlarge, with tcmalloc (through LD_PRELOAD), this copy (100MB) will take 8 milliseconds.

Also, as a sanity check you can look at timelines. A timeline will have MEMCPYH2D op which is the actual CPU->GPU copy, and you can use this to confirm that it dominates your microbenchmark step run-time enter image description here

Related issues:

benchmarking D2H and H2D: https://github.com/tensorflow/tensorflow/issues/17204
64-byte aligning input data: https://github.com/tensorflow/tensorflow/issues/17233

answered Sep 21 '22 16:09

Yaroslav Bulatov

Related questions
                            
                                Fill oceans in basemap [duplicate]
                            
                                how to make a https request in python 3
                            
                                Python remove hashtag symbol and keep key words
                            
                                pandas shift rows NaNs
                            
                                Plot multiple lines with holoviews
                            
                                how to replace pixel data on same dicom file using pydicom to read it again with any dicom viewer?
                            
                                Is there a way to export Allure Report to a single html file? To share with the team
                            
                                Dask prints warning to use client.scatter althought I'm using the suggested approach
                            
                                Limit Python function scope to local variables only
                            
                                Memory Error in Python 3 and Windows 64
                            
                                pandas DataFrame resample from irregular timeseries index
                            
                                How to let pytest discover and run doctests in installed modules?
                            
                                How do you validate a GraphQL mutation in Python
                            
                                Where is the csrftoken stored in Django database?
                            
                                How to write output data into pdf?
                            
                                Prohibit passing several feature switches in python click
                            
                                Django Ignoring Asynch Tests Completely (Django Channels)
                            
                                pandas: fill nans given a condition
                            
                                Tensorflow-gpu with pyinstaller
                            
                                What is the best method for setting up a config file in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Measuring time it takes to move data from RAM to GPU memory in Tensorflow

Tags:

python

tensorflow

gpu

Deeplearningmaniac

People also ask

1 Answers

Yaroslav Bulatov

Recent Activity

Donate For Us