How can I remove this single point of failure from my application architecture?

Question

I have an application which currently has the following setup:

Replicated MySQL DB
Distributed Work Queue
Several Work Queue Consumers/Workers
Single producer which adds jobs to the queue (server in red below)

The setup looks something like this:

The job producer queries the database for new items which need to be added to its list of recurring jobs that need to be added to the work queue every N minutes. This job producer is the only node in my whole architecture which if failed, would cause the entire process to fail. I can have a DB server, a queue server, or several worker servers fail and the process would continue to operate.

How can I modify the job producer so that it isn't a single point of failure? I don't know how to distribute the work it does, which is querying the database every N minutes and enqueuing new jobs to be processed. It is a singular task.

I considered having multiple producers, and each producer would use modulus to only process 1/P jobs where P is the number of producers.

Something like:

itemsToBeProcess = db.FetchItems()
for (item in itemsToBeProcessed) {
    if item.id % producerNumber == 0) // Queue job
}

This would divide the work of the producers to multiple servers. However, this still isn't ideal, because if a single producer goes down than 1/P worth of jobs will stop being processed. So, it would still be a partial failure.

Can anyone give any guidance on how I can make this job producer not be a single point of failure in my application?

Cyprian · Accepted Answer

Is there any specific reason to query db every N minutes? I would solve such problem in a way that instead of N minutes I would query for N items and change an item state (eg. "open" -> "in progress") using "select for update"* (to make sure an item is processing (retrieve and update the state) by one and only one producer). Thanks to that you would be able to scale/provide FO etc. without any problem.

How can I remove this single point of failure from my application architecture?

Tags:

architecture

high-availability

mmcdole

1 Answers

Cyprian

Recent Activity

Donate For Us

How can I remove this single point of failure from my application architecture?

Tags:

architecture

high-availability

mmcdole

1 Answers

Cyprian

Related questions

Recent Activity

Donate For Us