Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing GPU with tensorflow matrix multiplication

As many machine learning algorithms rely to matrix multiplication(or at least can be implemented using matrix multiplication) to test my GPU is I plan to create matrices a , b , multiply them and record time it takes for computation to complete.

Here is code that will generate two matrices of dimensions 300000,20000 and multiply them :

import tensorflow as tf
import numpy as np

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)


#a = np.array([[1, 2, 3], [4, 5, 6]])
#b = np.array([1, 2, 3])

a = np.random.rand(300000,20000)
b = np.random.rand(300000,20000)

println("Init complete");

result = tf.mul(a , b)
v = sess.run(result) 

print(v)

Is this a sufficient test to compare performance of GPU's ? What other factors should I consider ?

like image 621
blue-sky Avatar asked Jan 23 '17 10:01

blue-sky


People also ask

Is matrix multiplication faster on GPU?

In your case of matrix multiplication. You can parallelize the computations, Because GPU have much more threads and in each thread you have multiple blocks. So a lot of computations are parallelized, resulting quick computations.

How much faster is GPU than CPU in matrix multiplication?

For matrices of size 800×800, the GPU implementation is more than 3.5 times faster than the CPU one. Furthermore, the usage of the shared memory in the GPU implementation further speeds up the execution dramatically. On our input data this fast GPU implementation was up to 7.5 times faster than the CPU implementation.


1 Answers

Here's an example of a matmul benchmark which avoids common pitfalls, and matches the official 11 TFLOP mark on Titan X Pascal.

import os
import sys
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import tensorflow as tf
import time

n = 8192
dtype = tf.float32
with tf.device("/gpu:0"):
    matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype))
    matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype))
    product = tf.matmul(matrix1, matrix2)


# avoid optimizing away redundant nodes
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
sess = tf.Session(config=config)

sess.run(tf.global_variables_initializer())
iters = 10

# pre-warming
sess.run(product.op)

start = time.time()
for i in range(iters):
  sess.run(product.op)
end = time.time()
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications
elapsed = (end - start)
rate = iters*ops/elapsed/10**9
print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n,
                                                            elapsed/iters,
                                                            rate,))
like image 197
Yaroslav Bulatov Avatar answered Sep 20 '22 10:09

Yaroslav Bulatov