Using PyTorch with Celery

Question

I'm trying to run a PyTorch model in a Django app. As it is not recommended to execute the models (or any long-running task) in the views, I decided to run it in a Celery task. My model is quite big and it takes about 12 seconds to load and about 3 seconds to infer. That's why I decided that I couldn't afford to load it at every request. So I tried to load it at settings and save it there for the app to use it. So my final scheme is:

When the Django app starts, in the settings the PyTorch model is loaded and it's accessible from the app.
When views.py receives a request, it delays a celery task
The celery task uses the settings.model to infer the result

The problem here is that the celery task throws the following error when trying to use the model

[2020-08-29 09:03:04,015: ERROR/ForkPoolWorker-1] Task app.tasks.task[458934d4-ea03-4bc9-8dcd-77e4c3a9caec] raised unexpected: RuntimeError("Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method")
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  /*...*/
  File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/torch/cuda/__init__.py", line 191, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Here's the code in my settings.py loading the model:

if sys.argv and sys.argv[0].endswith('celery') and 'worker' in sys.argv: #In order to load only for the celery worker
    import torch
    torch.cuda.init()
    torch.backends.cudnn.benchmark = True
    load_model_file()

And the task code

@task
def getResult(name):
    print("Executing on GPU:", torch.cuda.is_available())
    if os.path.isfile(name):
        try:
            outpath = model_inference(name)
            os.remove(name)
            return outpath
        except OSError as e:
            print("Error", name, "doesn't exist")
    return ""

The print in the task shows "Executing on GPU: true"

I've tried setting torch.multiprocessing.set_start_method('spawn') in the settings.py before and after the torch.cuda.init() but it gives the same error.

Krzysztof Szularz · Accepted Answer

Setting this method works as long as you're also using Process from the same library.

from torch.multiprocessing import Pool, Process

Celery uses "regular" multiprocessing library, thus this error.

If I were you I'd try either:

run it single threaded to see if that helps
run it with eventlet to see if that helps
read this

vishal babu · Answer

A quick fix is to make things single-threaded. To do that set the worker pool type of celery to solo while starting the celery worker

celery -A your_proj worker -P solo -l info

Using PyTorch with Celery

Tags:

python

django

multiprocessing

pytorch

celery

JOSEMAFUEN

2 Answers

Krzysztof Szularz

vishal babu

Recent Activity

Donate For Us

Using PyTorch with Celery

Tags:

python

django

multiprocessing

pytorch

celery

JOSEMAFUEN

2 Answers

Krzysztof Szularz

vishal babu

Related questions

Recent Activity

Donate For Us