Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using PyTorch with Celery

I'm trying to run a PyTorch model in a Django app. As it is not recommended to execute the models (or any long-running task) in the views, I decided to run it in a Celery task. My model is quite big and it takes about 12 seconds to load and about 3 seconds to infer. That's why I decided that I couldn't afford to load it at every request. So I tried to load it at settings and save it there for the app to use it. So my final scheme is:

  • When the Django app starts, in the settings the PyTorch model is loaded and it's accessible from the app.
  • When views.py receives a request, it delays a celery task
  • The celery task uses the settings.model to infer the result

The problem here is that the celery task throws the following error when trying to use the model

[2020-08-29 09:03:04,015: ERROR/ForkPoolWorker-1] Task app.tasks.task[458934d4-ea03-4bc9-8dcd-77e4c3a9caec] raised unexpected: RuntimeError("Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method")
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  /*...*/
  File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/torch/cuda/__init__.py", line 191, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Here's the code in my settings.py loading the model:

if sys.argv and sys.argv[0].endswith('celery') and 'worker' in sys.argv: #In order to load only for the celery worker
    import torch
    torch.cuda.init()
    torch.backends.cudnn.benchmark = True
    load_model_file()

And the task code

@task
def getResult(name):
    print("Executing on GPU:", torch.cuda.is_available())
    if os.path.isfile(name):
        try:
            outpath = model_inference(name)
            os.remove(name)
            return outpath
        except OSError as e:
            print("Error", name, "doesn't exist")
    return ""

The print in the task shows "Executing on GPU: true"

I've tried setting torch.multiprocessing.set_start_method('spawn') in the settings.py before and after the torch.cuda.init() but it gives the same error.

like image 406
JOSEMAFUEN Avatar asked Aug 29 '20 09:08

JOSEMAFUEN


2 Answers

Setting this method works as long as you're also using Process from the same library.

from torch.multiprocessing import Pool, Process

Celery uses "regular" multiprocessing library, thus this error.

If I were you I'd try either:

  • run it single threaded to see if that helps
  • run it with eventlet to see if that helps
  • read this
like image 146
Krzysztof Szularz Avatar answered Sep 17 '22 16:09

Krzysztof Szularz


A quick fix is to make things single-threaded. To do that set the worker pool type of celery to solo while starting the celery worker

celery -A your_proj worker -P solo -l info
like image 21
vishal babu Avatar answered Sep 16 '22 16:09

vishal babu