I'm trying to run a PyTorch model in a Django app. As it is not recommended to execute the models (or any long-running task) in the views, I decided to run it in a Celery task. My model is quite big and it takes about 12 seconds to load and about 3 seconds to infer. That's why I decided that I couldn't afford to load it at every request. So I tried to load it at settings and save it there for the app to use it. So my final scheme is:
The problem here is that the celery task throws the following error when trying to use the model
[2020-08-29 09:03:04,015: ERROR/ForkPoolWorker-1] Task app.tasks.task[458934d4-ea03-4bc9-8dcd-77e4c3a9caec] raised unexpected: RuntimeError("Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method")
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
/*...*/
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/torch/cuda/__init__.py", line 191, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Here's the code in my settings.py loading the model:
if sys.argv and sys.argv[0].endswith('celery') and 'worker' in sys.argv: #In order to load only for the celery worker
import torch
torch.cuda.init()
torch.backends.cudnn.benchmark = True
load_model_file()
And the task code
@task
def getResult(name):
print("Executing on GPU:", torch.cuda.is_available())
if os.path.isfile(name):
try:
outpath = model_inference(name)
os.remove(name)
return outpath
except OSError as e:
print("Error", name, "doesn't exist")
return ""
The print in the task shows "Executing on GPU: true"
I've tried setting torch.multiprocessing.set_start_method('spawn')
in the settings.py before and after the torch.cuda.init()
but it gives the same error.
Setting this method works as long as you're also using Process
from the same library.
from torch.multiprocessing import Pool, Process
Celery uses "regular" multiprocessing
library, thus this error.
If I were you I'd try either:
A quick fix is to make things single-threaded. To do that set the worker pool type of celery to solo while starting the celery worker
celery -A your_proj worker -P solo -l info
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With