As many machine learning algorithms rely to matrix multiplication(or at least can be implemented using matrix multiplication) to test my GPU is I plan to create matrices a , b , multiply them and record time it takes for computation to complete.
Here is code that will generate two matrices of dimensions 300000,20000 and multiply them :
import tensorflow as tf
import numpy as np
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
#a = np.array([[1, 2, 3], [4, 5, 6]])
#b = np.array([1, 2, 3])
a = np.random.rand(300000,20000)
b = np.random.rand(300000,20000)
println("Init complete");
result = tf.mul(a , b)
v = sess.run(result)
print(v)
Is this a sufficient test to compare performance of GPU's ? What other factors should I consider ?
In your case of matrix multiplication. You can parallelize the computations, Because GPU have much more threads and in each thread you have multiple blocks. So a lot of computations are parallelized, resulting quick computations.
For matrices of size 800×800, the GPU implementation is more than 3.5 times faster than the CPU one. Furthermore, the usage of the shared memory in the GPU implementation further speeds up the execution dramatically. On our input data this fast GPU implementation was up to 7.5 times faster than the CPU implementation.
Here's an example of a matmul benchmark which avoids common pitfalls, and matches the official 11 TFLOP mark on Titan X Pascal.
import os
import sys
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import tensorflow as tf
import time
n = 8192
dtype = tf.float32
with tf.device("/gpu:0"):
matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype))
matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype))
product = tf.matmul(matrix1, matrix2)
# avoid optimizing away redundant nodes
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
sess = tf.Session(config=config)
sess.run(tf.global_variables_initializer())
iters = 10
# pre-warming
sess.run(product.op)
start = time.time()
for i in range(iters):
sess.run(product.op)
end = time.time()
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications
elapsed = (end - start)
rate = iters*ops/elapsed/10**9
print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n,
elapsed/iters,
rate,))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With