Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery vs. ProcessPoolExecutor / ThreadPoolExecutor

I am creating a django webserver that allows the user to run some "executables" on a local machine and to analyse their output through a webpage.

I have previously used a Celery tasks queue in order to run "executables" in similar situations. However, after reading up on Python concurrent.futures, I am beginning to wonder if I should use ThreadPoolExecutor, or ProcessPoolExecutor (or ThreadPoolExecutor inside a ProcessPoolExecutor :D) instead?

Googling I could only find one relevant question comparing Celery to Tornado, and it steered to using Tornado alone.

So should I use Celery or a PoolExecutor for my simple webserver, and why?

like image 412
ostrokach Avatar asked May 18 '16 05:05

ostrokach


People also ask

What is difference between ThreadPoolExecutor and ProcessPoolExecutor?

Perhaps the most important difference is the type of workers used by each class. As their names suggest, the ThreadPoolExecutor uses threads internally, whereas the ProcessPoolExecutor uses processes. A process has a main thread and may have additional threads. A thread belongs to a process.

Why do we use ThreadPoolExecutor?

Use the ThreadPoolExecutor class when you need to execute tasks that may or may not take arguments and may or may not return a result once the tasks are complete. Use the ThreadPoolExecutor class when you need to execute different types of ad hoc tasks, such as calling different target task functions.

Is ThreadPoolExecutor concurrent?

The ThreadPoolExecutor Python class is used to create and manage thread pools and is provided in the concurrent. futures module.

What is ThreadPoolExecutor in Python?

A thread pool is a pattern for managing multiple threads efficiently. Use ThreadPoolExecutor class to manage a thread pool in Python. Call the submit() method of the ThreadPoolExecutor to submit a task to the thread pool for execution. The submit() method returns a Future object.


1 Answers

You need to use celery if:

  1. You want to scale easily and independently from your webserver
  2. You want a way to monitor your task and retry them if they fail
  3. You want to create more advanced task execution patterns (ex. chain them)

In addition to this is a very mature library with side projects that helps you also on UI presentation side, have a look at Jobtastic.

If you don't need any of the listed point and you just need to execute this task without caring to much about status and without particular needs of scalability than just keep it simple.

About using ThreadPoolExecutor or ProcessPoolExecutor just keep in mind that the second will be able to receive and return only pickable objects and that the first will spawn child thread attached to your main process (probably your webserver if you are not using it inside another detached process) so the approach of mix them can make sense depending on the details of your implementation.

like image 122
Mauro Rocco Avatar answered Sep 26 '22 20:09

Mauro Rocco