As I understand, asynchronous networking frameworks/libraries like twisted, tornado, and asyncio provide asynchronous IO through implementing nonblocking sockets and an event loop. Gevent achieves essentially the same thing through monkey patching the standard library, so explicit asynchronous programming via callbacks and coroutines is not required. On the other hand, asynchronous task queues, like Celery, manage background tasks and distribute those tasks across multiple threads or machines. I do not fully understand this process but it involves message brokers, messages, and workers. My questions, <ol> <li>Do asynchronous task queues require asynchronous IO? Are they in any way related? The two concepts seem similar, but the implementations at the application level are different. I would think that the only thing they have in common is the word "asynchronous", so perhaps that is throwing me off.</li> <li>Can someone elaborate on how task queues work and the relationship between the message broker (why are they required?), the workers, and the messages (what are messages? bytes?).</li> </ol> Oh, and I'm not trying to solve any specific problems, I'm just trying to understand the ideas behind asynchronous task queues and asynchronous IO.

Asynchronous IO is a way to use sockets (or more generally file descriptors) without blocking. This term is specific to one process or even one thread. You can even imagine mixing threads with asynchronous calls. It would be completely fine, yet somewhat complicated. Now I have no idea what <code>asynchronous task queue</code> means. IMHO there's only a task queue, it's a data structure. You can access it in asynchronous or synchronous way. And by "access" I mean <code>push</code> and <code>pop</code> calls. These can use network internally. So task queue is a data structure. (A)synchronous IO is a way to access it. That's everything there is to it. The term <code>asynchronous</code> is havily overused nowadays. The hype is real. <hr> As for your second question: <ol> <li>Message is just a set of data, a sequence of bytes. It can be anything. Usually these are some structured strings, like JSON.</li> <li>Task == message. The different word is used to notify the purpose of that data: to perform some task. For example you would send a message <code>{"task": "process_image"}</code> and your consumer will fire an appropriate function.</li> <li>Task queue Q is a just a queue (the data structure).</li> <li>Producer P is a process/thread/class/function/thing that pushes messages to Q.</li> <li>Consumer (or worker) C is a process/thread/class/function/thing that pops messages from Q and does some processing on it.</li> <li>Message broker B is a process that redistributes messages. In this case a producer P sends a message to B (rather then directly to a queue) and then B can (for example) duplicate this message and send to 2 different queues Q1 and Q2 so that 2 different workers C1 and C2 will get that message. Message brokers can also act as protocol translators, can transform messages, aggregate them and do many many things. Generally it's just a blackbox between producers and consumers.</li> </ol> As you can see there are no formal definitions of those things and you have to use a bit of intuition to fully understand them.

Asynchronous task queues and asynchronous IO

Tags:

python

asynchronous

concurrency

celery

As I understand, asynchronous networking frameworks/libraries like twisted, tornado, and asyncio provide asynchronous IO through implementing nonblocking sockets and an event loop. Gevent achieves essentially the same thing through monkey patching the standard library, so explicit asynchronous programming via callbacks and coroutines is not required.

On the other hand, asynchronous task queues, like Celery, manage background tasks and distribute those tasks across multiple threads or machines. I do not fully understand this process but it involves message brokers, messages, and workers.

My questions,

Do asynchronous task queues require asynchronous IO? Are they in any way related? The two concepts seem similar, but the implementations at the application level are different. I would think that the only thing they have in common is the word "asynchronous", so perhaps that is throwing me off.
Can someone elaborate on how task queues work and the relationship between the message broker (why are they required?), the workers, and the messages (what are messages? bytes?).

Oh, and I'm not trying to solve any specific problems, I'm just trying to understand the ideas behind asynchronous task queues and asynchronous IO.

325

asked Apr 09 '16 14:04

puketronic

1 Answers

Asynchronous IO is a way to use sockets (or more generally file descriptors) without blocking. This term is specific to one process or even one thread. You can even imagine mixing threads with asynchronous calls. It would be completely fine, yet somewhat complicated.

Now I have no idea what asynchronous task queue means. IMHO there's only a task queue, it's a data structure. You can access it in asynchronous or synchronous way. And by "access" I mean push and pop calls. These can use network internally.

So task queue is a data structure. (A)synchronous IO is a way to access it. That's everything there is to it.

The term asynchronous is havily overused nowadays. The hype is real.

As for your second question:

Message is just a set of data, a sequence of bytes. It can be anything. Usually these are some structured strings, like JSON.
Task == message. The different word is used to notify the purpose of that data: to perform some task. For example you would send a message {"task": "process_image"} and your consumer will fire an appropriate function.
Task queue Q is a just a queue (the data structure).
Producer P is a process/thread/class/function/thing that pushes messages to Q.
Consumer (or worker) C is a process/thread/class/function/thing that pops messages from Q and does some processing on it.
Message broker B is a process that redistributes messages. In this case a producer P sends a message to B (rather then directly to a queue) and then B can (for example) duplicate this message and send to 2 different queues Q1 and Q2 so that 2 different workers C1 and C2 will get that message. Message brokers can also act as protocol translators, can transform messages, aggregate them and do many many things. Generally it's just a blackbox between producers and consumers.

As you can see there are no formal definitions of those things and you have to use a bit of intuition to fully understand them.

194

answered Oct 23 '22 11:10

freakish

Related questions
                            
                                Wiener Filter for image deblur
                            
                                Why does Pandas coerce my numpy float32 to float64?
                            
                                Multiple graphs on the same plot in seaborn
                            
                                Python PrettyTable: Add title above the table's header
                            
                                Best practices for architecturing data validation in a Django multi sided project [closed]
                            
                                Can I add field in __init__ wtforms
                            
                                How can I vectorize this for loop in numpy?
                            
                                Convert images drawn by turtle to PNG in Python
                            
                                Can I use same virtual environment on different computers
                            
                                Why can't the underscore be matched by '\W'?
                            
                                gzip fails at writing high amount of data in file
                            
                                Iterate through a dynamic number of for loops (Python)
                            
                                Python 3.3 C-API and UTF-8 Strings
                            
                                Why do we have to provide WSGI_APPLICATION variable in Django settings
                            
                                Concatenate tuple with variable
                            
                                ValueError: A value in x_new is below the interpolation range
                            
                                Relationship between between type and object in python
                            
                                Injecting pre-trained word2vec vectors into TensorFlow seq2seq
                            
                                Use python and psycopg2 to execute a sql file that contains a DROP DATABASE statement
                            
                                Python packages with conflicting dependencies

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With