PyTorch RuntimeError: DataLoader worker (pid(s) 15332) exited unexpectedly

Tags:

I am a beginner at PyTorch and I am just trying out some examples on this webpage. But I can't seem to get the 'super_resolution' program running due to this error:

RuntimeError: DataLoader worker (pid(s) 15332) exited unexpectedly

I searched the Internet and found that some people suggest setting num_workers to 0. But if I do that, the program tells me that I am running out of memory (either with CPU or GPU):

RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 9663676416 bytes. Buy new RAM!

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 4.00 GiB total capacity; 2.03 GiB already allocated; 0 bytes free; 2.03 GiB reserved in total by PyTorch)

How do I fix this?

I am using python 3.8 on Win10(64bit) and pytorch 1.4.0.

More complete error messages (--cuda means using GPU, --threads x means passing x to the num_worker parameter):

with command line arguments --upscale_factor 1 --cuda

  File "E:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "E:\Python38\lib\multiprocessing\queues.py", line 108, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Z:\super_resolution\main.py", line 81, in <module>
    train(epoch)
  File "Z:\super_resolution\main.py", line 48, in train
    for iteration, batch in enumerate(training_data_loader, 1):
  File "E:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
    data = self._next_data()
  File "E:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 841, in _next_data
    idx, data = self._get_data()
  File "E:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 808, in _get_data
    success, data = self._try_get_data()
  File "E:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 774, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 16596, 9376, 12756, 9844) exited unexpectedly

with command line arguments --upscale_factor 1 --cuda --threads 0

  File "Z:\super_resolution\main.py", line 81, in <module>
    train(epoch)
  File "Z:\super_resolution\main.py", line 52, in train
    loss = criterion(model(input), target)
  File "E:\Python38\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "Z:\super_resolution\model.py", line 21, in forward
    x = self.relu(self.conv2(x))
  File "E:\Python38\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Python38\lib\site-packages\torch\nn\modules\conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "E:\Python38\lib\site-packages\torch\nn\modules\conv.py", line 341, in conv2d_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 4.00 GiB total capacity; 2.03 GiB already allocated; 954.35 MiB free; 2.03 GiB reserved in total by PyTorch)

681

asked Feb 06 '20 18:02

ihdv

2 Answers

There is no "complete" solve for GPU out of memory errors, but there are quite a few things you can do to relieve the memory demand. Also, make sure that you are not passing the trainset and testset to the GPU at the same time!

Decrease batch size to 1
Decrease the dimensionality of the fully-connected layers (they are the most memory-intensive)
(Image data) Apply centre cropping
(Image data) Transform RGB data to greyscale
(Text data) Truncate input at n chars (which probably won't help that much)

Alternatively, you can try running on Google Colaboratory (12 hour usage limit on K80 GPU) and Next Journal, both of which provide up to 12GB for use, free of charge. Worst case scenario, you might have to conduct training on your CPU. Hope this helps!

114

answered Oct 19 '22 16:10

ccl

This is the solution that worked for me. it may work for other Windows users. Just remove/comment the num workers to disable parallel loads

answered Oct 19 '22 17:10

Aneesh Cherian K

Related questions
                            
                                What does the colour of the 'schedule' column in the airflow UI mean?
                            
                                Pojo like classes in Python
                            
                                Can generating permutations be done in parallel?
                            
                                no module named cairo - python and pip
                            
                                How to display an image from a numpy array in tkinter?
                            
                                pandas groupby transform custom function
                            
                                How can I use git repos as dependencies for my PyPi package?
                            
                                Pandas GroupBy and Calculate Z-Score [duplicate]
                            
                                Trouble modifying the language option in selenium python bindings
                            
                                Unable to solve "ImportError: dynamic module does not define module export function"
                            
                                How do I correctly set MYPYPATH to pick up stubs for mypy?
                            
                                pytorch embedding index out of range
                            
                                How to resolve inconsistent package warnings in conda?
                            
                                Make Python script combined with linux packages easy installable for end-user
                            
                                How do I see the time it took to run my program in Visual Studio Code?
                            
                                Non-overlapping rolling windows in pandas dataframes
                            
                                How to efficiently use CountVectorizer to get ngram counts for all files in a directory combined?
                            
                                Implementing PCA with Numpy
                            
                                How to solve an error that appears in conda proxy configuration?
                            
                                Having trouble reading AWS config file with python configparser

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PyTorch RuntimeError: DataLoader worker (pid(s) 15332) exited unexpectedly

Tags:

python

python-3.x

pytorch

ihdv

People also ask

2 Answers

ccl

Aneesh Cherian K

Recent Activity

Donate For Us