Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TPU slower than GPU?

I just tried using TPU in Google Colab and I want to see how much TPU is faster than GPU. I got surprisingly the opposite result.

The following is the NN.

  random_image = tf.random_normal((100, 100, 100, 3))
  result = tf.layers.conv2d(random_image, 32, 7)
  result = tf.reduce_sum(result)

Performance results:

CPU: 8s
GPU: 0.18s
TPU: 0.50s

I wonder why.... The complete code for TPU is as follows:

def calc():
  random_image = tf.random_normal((100, 100, 100, 3))
  result = tf.layers.conv2d(random_image, 32, 7)
  result = tf.reduce_sum(result)
  return result

tpu_ops = tf.contrib.tpu.batch_parallel(calc, [], num_shards=8)

session = tf.Session(tpu_address)
try:
  print('Initializing global variables...')
  session.run(tf.global_variables_initializer())
  print('Warming up...')
  session.run(tf.contrib.tpu.initialize_system())
  print('Profiling')
  start = time.time()
  session.run(tpu_ops)
  end = time.time()
  elapsed = end - start
  print(elapsed)
finally:
  session.run(tf.contrib.tpu.shutdown_system())
  session.close()
like image 324
fatdragon Avatar asked Sep 30 '18 17:09

fatdragon


People also ask

Is TPU slower than GPU?

Takeaways: From observing the training time, it can be seen that the TPU takes considerably more training time than the GPU when the batch size is small. But when batch size increases the TPU performance is comparable to that of the GPU.

Which is faster TPU or GPU?

The TPU is 15x to 30x faster than current GPUs and CPUs on production AI applications that use neural network inference.

Is colab TPU better than GPU?

“Artificial neural networks based on the AI applications used to train the TPUs are 15 and 30 times faster than CPUs and GPUs!”

How fast is TPU?

An individual Edge TPU can perform 4 trillion operations per second (4 TOPS), using only 2 watts of power—in other words, you get 2 TOPS per watt. For example, the Edge TPU can execute state-of-the-art mobile vision models such as MobileNet V2 at almost 400 frames per second, and in a power efficient manner.


1 Answers

Benchmarking devices properly is hard, so please take everything you learn from these examples with a grain of salt. It's better in general to compare specific models you are interested in (e.g. running an ImageNet network) to understand performance differences. That said, I understand it's fun to do this, so...

Larger models will illustrate the TPU and GPU performance better. Your example also is including the compilation time in the cost of the TPU call: every call after the first for a given program and shape will be cached, so you will want to tpu_ops once before starting the timer unless you want to capture the compilation time.

Currently each call to a TPU function copies the weights to the TPU before it can start running, this affects small operations more significantly. Here's an example that runs a loop on the TPU before returning to the CPU, with the following outputs.

  • 1 0.010800600051879883
  • 10 0.09931182861328125
  • 100 0.5581905841827393
  • 500 2.7688047885894775

. So you can actually run 100 iterations of this function in 0.55s.

import os
import time
import tensorflow as tf

def calc(n):
  img = tf.random_normal((128, 100, 100, 3))
  def body(_):
    result = tf.layers.conv2d(img, 32, 7)
    result = tf.reduce_sum(result)
    return result

  return tf.contrib.tpu.repeat(n[0], body, [0.0])


session = tf.Session('grpc://' + os.environ['COLAB_TPU_ADDR'])
try:
  print('Initializing TPU...')
  session.run(tf.contrib.tpu.initialize_system())

  for i in [1, 10, 100, 500]:
    tpu_ops = tf.contrib.tpu.batch_parallel(calc, [[i] * 8], num_shards=8)
    print('Warming up...')
    session.run(tf.global_variables_initializer())
    session.run(tpu_ops)
    print('Profiling')
    start = time.time()
    session.run(tpu_ops)
    end = time.time()
    elapsed = end - start
    print(i, elapsed)
finally:
  session.run(tf.contrib.tpu.shutdown_system())
  session.close()
like image 51
Russell Power Avatar answered Oct 20 '22 09:10

Russell Power