Basic multi GPU parallelization of matrix multiplication

Question

I want to parallelize the simple following expression on 2 GPUs: C = A^n + B^n by calculating A^n on GPU 0 and B^n on GPU 1 before summing the results.

In TensorFlow I would go like:

with tf.device('/gpu:0'):
    An = matpow(A, n)
with tf.device('/gpu:1'):
    Bn = matpow(B, n)
with tf.Session() as sess:
    C = sess.run(An + Bn)

However, since PyTorch is dynamic, I'm having trouble doing the same thing. I tried the following but it only takes more time.

with torch.cuda.device(0):
    A = A.cuda()       
with torch.cuda.device(1):
    B = B.cuda()
C = matpow(A, n) + matpow(B, n).cuda(0)

I know there is a module to parallelize models on the batch dimension using torch.nn.DataParallel but here I try to do something more basic.

blckbird · Accepted Answer

You can use cuda streams for this. This will not necessarily distribute it over two devices, but the execution will be in parallel.

s1 = torch.cuda.Stream()
s2 = torch.cuda.Stream()

with torch.cuda.stream(s1):
    A = torch.pow(A,n)
with torch.cuda.stream(s2):
    B = torch.pow(B,n)

C = A+B

Although I'm not sure whether it will really speed up your computation if you only parallelize this one operation. Your matrices must be really big.

If your requirement is to split it across devices, you can add this before the streams:

A = A.cuda(0)
B = B.cuda(1)

Then after the power operation, you need to get them on the same device again, e.g. B = B.cuda(0). After that you can do the addition.

Basic multi GPU parallelization of matrix multiplication

Tags:

python

gpu

pytorch

BiBi

1 Answers

blckbird

Recent Activity

Donate For Us

Basic multi GPU parallelization of matrix multiplication

Tags:

python

gpu

pytorch

BiBi

1 Answers

blckbird

Related questions

Recent Activity

Donate For Us