How to evenly balance processing many simultaneous tasks?

Question

PROBLEM

Our PROCESSING SERVICE is serving UI, API, and internal clients and listening for commands from Kafka. Few API clients might create a lot of generation tasks (one task is N messages) in a short time. With Kafka, we can't control commands distribution, because each command comes to the partition which is consumed by one processing instance (aka worker). Thus, UI requests could be waiting too long while API requests are processing.

In an ideal implementation, we should handle all tasks evenly, regardless of its size. The capacity of the processing service is distributed among all active tasks. And even if the cluster is heavily loaded, we always understand that the new task that has arrived will be able to start processing almost immediately, at least before the processing of all other tasks ends. enter image description here

SOLUTION

Instead, we want an architecture that looks more like the following diagram, where we have separate queues per combination of customer and endpoint. This architecture gives us much better isolation, as well as the ability to dynamically adjust throughput on a per-customer basis. enter image description here On the side of the producer

the task comes from the client
immediately create a queue for this task
send all messages to this queue

On the side of the consumer

in one process, you constantly update the list of queues
in other processes, you follow this list and consume for example 1 message from each queue
scale consumers

QUESTION

Is there any common solution to such a problem? Using RabbitMQ or any other tooling. Нistorically, we use Kafka on the project, so if there is any approach using - it is amazing, but we can use any technology for the solution.

VanBantam · Accepted Answer

Why not use spark to execute the messages within the task? What I'm thinking is that each worker creates a spark context that then parallelizes the messages. The function that is mapped can be based on which kafka topic the user is consuming. I suspect however your queues might have tasks that contained a mixture of messages, UI, API calls, etc. This will result in a more complex mapping function. If you're not using a standalone cluster and are using YARN or something similar you can change the queueing method that the spark master is using.

How to evenly balance processing many simultaneous tasks?

Tags:

asynchronous

parallel-processing

rabbitmq

apache-kafka

concurrent-processing

PROBLEM

SOLUTION

QUESTION

Aleksandr Krivolap

1 Answers

VanBantam

Recent Activity

Donate For Us

How to evenly balance processing many simultaneous tasks?

Tags:

asynchronous

parallel-processing

rabbitmq

apache-kafka

concurrent-processing

PROBLEM

SOLUTION

QUESTION

Aleksandr Krivolap

1 Answers

VanBantam

Related questions

Recent Activity

Donate For Us