Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic multi GPU parallelization of matrix multiplication

I want to parallelize the simple following expression on 2 GPUs: C = A^n + B^n by calculating A^n on GPU 0 and B^n on GPU 1 before summing the results.

In TensorFlow I would go like:

with tf.device('/gpu:0'):
    An = matpow(A, n)
with tf.device('/gpu:1'):
    Bn = matpow(B, n)
with tf.Session() as sess:
    C = sess.run(An + Bn)

However, since PyTorch is dynamic, I'm having trouble doing the same thing. I tried the following but it only takes more time.

with torch.cuda.device(0):
    A = A.cuda()       
with torch.cuda.device(1):
    B = B.cuda()
C = matpow(A, n) + matpow(B, n).cuda(0)

I know there is a module to parallelize models on the batch dimension using torch.nn.DataParallel but here I try to do something more basic.

like image 740
BiBi Avatar asked Jun 05 '17 14:06

BiBi


1 Answers

You can use cuda streams for this. This will not necessarily distribute it over two devices, but the execution will be in parallel.

s1 = torch.cuda.Stream()
s2 = torch.cuda.Stream()

with torch.cuda.stream(s1):
    A = torch.pow(A,n)
with torch.cuda.stream(s2):
    B = torch.pow(B,n)

C = A+B

Although I'm not sure whether it will really speed up your computation if you only parallelize this one operation. Your matrices must be really big.

If your requirement is to split it across devices, you can add this before the streams:

A = A.cuda(0)
B = B.cuda(1)

Then after the power operation, you need to get them on the same device again, e.g. B = B.cuda(0). After that you can do the addition.

like image 120
blckbird Avatar answered Oct 14 '22 09:10

blckbird