I am programming with PyTorch multiprocessing. I want all the subprocesses can read/write the same list of tensors (no resize). For example the variable can be
m = list(torch.randn(3), torch.randn(5))
Because each tensor has different sizes, I cannot organize them into a single tensor.
A python list has no share_memory_() function, and multiprocessing.Manager cannot handle a list of tensors. How can I share the variable m among multiple subprocesses?
torch. multiprocessing is a drop in replacement for Python's multiprocessing module. It supports the exact same operations, but extends it, so that all tensors sent through a multiprocessing. Queue , will have their data moved into shared memory and will only send a handle to another process.
item () → number. Returns the value of this tensor as a standard Python number. This only works for tensors with one element. For other cases, see tolist() .
share_memory_() will move the tensor data to shared memory on the host so that it can be shared between multiple processes. It is a no-op for CUDA tensors as described in the docs.
A torch.Tensor is a multi-dimensional matrix containing elements of a single data type.
I find the solution by myself. It is pretty straightforward. Just call share_memory_() for each list elements. The list itself is not in the shared memory, but the list elements are.
Demo code
import torch.multiprocessing as mp
import torch
def foo(worker,tl):
tl[worker] += (worker+1) * 1000
if __name__ == '__main__':
tl = [torch.randn(2), torch.randn(3)]
for t in tl:
t.share_memory_()
print("before mp: tl=")
print(tl)
p0 = mp.Process(target=foo, args=(0, tl))
p1 = mp.Process(target=foo, args=(1, tl))
p0.start()
p1.start()
p0.join()
p1.join()
print("after mp: tl=")
print(tl)
Output
before mp: tl=
[
1.5999
2.2733
[torch.FloatTensor of size 2]
,
0.0586
0.6377
-0.9631
[torch.FloatTensor of size 3]
]
after mp: tl=
[
1001.5999
1002.2733
[torch.FloatTensor of size 2]
,
2000.0586
2000.6377
1999.0370
[torch.FloatTensor of size 3]
]
The original answer given by @rozyang does not work with GPUs. It raises error like RuntimeError: CUDA error: initialization error CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
To fix it, add mp.set_start_method('spawn', force=True)
to codes. The following is a snippet:
import torch.multiprocessing as mp
import torch
def foo(worker,tl):
tl[worker] += (worker+1) * 1000
if __name__ == '__main__':
mp.set_start_method('spawn', force=True)
tl = [torch.randn(2, device='cuda:0'), torch.randn(3, device='cuda:0')]
for t in tl:
t.share_memory_()
print("before mp: tl=")
print(tl)
p0 = mp.Process(target=foo, args=(0, tl))
p1 = mp.Process(target=foo, args=(1, tl))
p0.start()
p1.start()
p0.join()
p1.join()
print("after mp: tl=")
print(tl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With