How to run classify_image on multiple GPU?

Tags:

I want to run vectorization on images using multiple GPUs (for now my script use only one GPU). I have a list of images, graph and session. THe script's output is saved vector. My machine has 3 NVIDIA GPU. Environment: Ubuntu, python 3.7, Tensorflow 2.0 (with GPU support). Here is my code example (initialization session):

def load_graph(frozen_graph_filename):
     # We load the protobuf file from the disk and parse it to retrieve the
     # unserialized graph_def
     with tf.io.gfile.GFile(frozen_graph_filename, "rb") as f:
         graph_def = tf.compat.v1.GraphDef()
         graph_def.ParseFromString(f.read())
     # Then, we import the graph_def into a new Graph and returns it
     with tf.Graph().as_default() as graph:
         # The name var will prefix every op/nodes in your graph
         # Since we load everything in a new graph, this is not needed
         tf.import_graph_def(graph_def, name="")
     return graph

GRAPH = load_graph(os.path.join(settings.IMAGENET_PATH['PATH'], 'classify_image_graph_def.pb'))
config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
config.gpu_options.allow_growth = True
SESSION = tf.compat.v1.Session(graph=GRAPH, config=config)

After that I called run vectorization as:

sess = SESSION
for image_index, image in enumerate(image_list):
    with Image.open(image) as f:
        image_data = f.convert('RGB')
        feature_tensor = POOL_TENSOR
        feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data})
        feature_vector = np.squeeze(feature_set)
        outfile_name = os.path.basename(image) + ".vc"
        this_is_path = settings.VECTORS_DIR_PATH['PATH']
        out_path = os.path.join(this_is_path, outfile_name)
        np.savetxt(out_path, feature_vector, delimiter=',')

This worked example runs on first GPU 100 vectors in 29 seconds. So, I tried this distributed training method from Tensorflow docs to run on multiple GPUs:

mirorred_strategy = tf.distribute.MirorredStrategy()
with mirorred_strategy.scope():
    sess = SESSION
    # and here all the code from previous example after session:
    for image_index, image in enumerate(image_list):
        with Image.open(image) as f:
            image_data = f.convert('RGB')
            feature_tensor = POOL_TENSOR
            feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data})
            feature_vector = np.squeeze(feature_set)
            outfile_name = os.path.basename(image) + ".vc"
            this_is_path = settings.VECTORS_DIR_PATH['PATH']
            out_path = os.path.join(this_is_path, outfile_name)
            np.savetxt(out_path, feature_vector, delimiter=',')

After I checked the logs, I can conclude that Tensorflow has access to all three GPU. However, this does not change anything: when running, Tensorflow is stil using only the first GPU (100 vectors in 29 seconds). Another method I tried is I manually set each item to concrete GPU instance:

sess = SESSION
for image_index, image in enumerate(image_list):
    if image_index % 2 == 0:
        device_name = '/gpu:1'
    elif image_index % 3 == 0:
        device_name = '/gpu:2'
    else:
        device_name = '/gpu:0'
    with tf.device(device_name):
        with Image.open(image) as f:
            image_data = f.convert('RGB')
            feature_tensor = POOL_TENSOR
            feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data})
            feature_vector = np.squeeze(feature_set)
            outfile_name = os.path.basename(image) + ".vc"
            this_is_path = settings.VECTORS_DIR_PATH['PATH']
            out_path = os.path.join(this_is_path, outfile_name)
            np.savetxt(out_path, feature_vector, delimiter=',')

Monitoring this method I observe every GPU being used but no performance speedup is seen because Tensorflow is swapping from one GPU device to another. So, on first item GPU:0 will be used and GPU:1, GPU:2are just waiting, on second item GPU:1 will be working and GPU:0, GPU:2 will be waiting. I am also tried another Tensorflow strategy from tf docs - without any changes. Also tried to define tf.Session() inside the for loop - without success. And found this - but cannot make it work for my code.

My questions are:

1) If there a way to modify tf.distribute.MirorredStrategy() to make Tensorflow use all three GPU?

2) If answer on (1) is not, how can I run vectorization using all GPU power(maybe here exists async way for doing this or something)?

476

asked Dec 09 '19 14:12

Dmitriy Kisil

1 Answers

The reason why your mirorred_strategy (from the third code snippet) is not using all GPUs is that your model input is manually given (using the TF1-style feature_tensor tensor) and TensorFlow doesn't know how to automatically distribute data evenly to your GPUs, you may take a look at the docs here.

And the fourth snippet (last one) also fails because the way you use it is not correct, you can try to first construct your model graph and then run the graph in a session, but not put them together, you can try to move the feature_set = sess.run(feature_tensor, {'DecodeJpeg:0': image_data}) outside the for loop. The guide here may illustrate a bit better.

answered Oct 10 '22 18:10

Arron Cao

Related questions
                            
                                parallel pivot_longer of two sets of columns
                            
                                serialize-javascript vulnerability found in yarn.lock
                            
                                How to test Java Spring Boot application without @SpringBootApplication using JUnit?
                            
                                About `native` keyword in Flutter
                            
                                What's the risk in using project-id in GCS bucket names?
                            
                                How is using -pthread not violation ODR rules?
                            
                                Python Watchdog process existing files on startup
                            
                                Python Dash Datatable : Row selection Not working
                            
                                Make a numpy array of sets from a list of lists
                            
                                Is it possible to have an S3 slot in an S4 class?
                            
                                input stream in C++. A little confusion with cin unget() function
                            
                                Dynamic memory allocation with aligned storage

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to run classify_image on multiple GPU?

Tags:

python

tensorflow

gpu

tensorflow2.0

Dmitriy Kisil

People also ask

1 Answers

Arron Cao

Recent Activity

Donate For Us