I have the following code which I am trying to parallelize over multiple GPUs in PyTorch:
import numpy as np
import torch
from torch.multiprocessing import Pool
X = np.array([[1, 3, 2, 3], [2, 3, 5, 6], [1, 2, 3, 4]])
X = torch.DoubleTensor(X).cuda()
def X_power_func(j):
X_power = X**j
return X_power
if __name__ == '__main__':
with Pool(processes = 2) as p: # Parallelizing over 2 GPUs
results = p.map(X_power_func, range(4))
results
But when I ran the code, I am getting this error:
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "<ipython-input-35-6529ab6dac60>", line 11, in X_power_func
X_power = X**j
RuntimeError: CUDA error: initialization error
"""
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-35-6529ab6dac60> in <module>()
14 if __name__ == '__main__':
15 with Pool(processes = 1) as p:
---> 16 results = p.map(X_power_func, range(8))
17
18 results
1 frames
/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642 return self._value
643 else:
--> 644 raise self._value
645
646 def _set(self, i, obj):
RuntimeError: CUDA error: initialization error
Where have I gone wrong? Any help would really be appreciated.
I think the usual approach is to call model.share_memory()
once before multiprocessing, assuming you have a model which subclasses nn.Module
. For tensors, it should be X.share_memory_()
. Unfortunately, I had trouble getting that to work with your code, it hangs (without errors) if X.share_memory_()
is called before calling pool.map; I'm not sure if the reason is because X is a global variable which is not passed as one of the arguments in map.
What does work is this:
X = torch.DoubleTensor(X)
def X_power_func(j):
X_power = X.cuda()**j
return X_power
Btw: https://github.com/pytorch/pytorch/issues/15734 mentions that "CUDA API must not be initialized before you fork" (this is likely the issue you were seeing).
Also https://github.com/pytorch/pytorch/issues/17680 if using spawn in Jupyter notebooks "the spawn method will run everything in your notebook top-level" (likely the issue I was seeing when my code was hanging, in a notebook). In short, I couldn't get either fork or spawn to work, except using the sequence above (which doesn't use CUDA until it's in the forked process).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With