My question has some similarities to this question: Why do we need message brokers like RabbitMQ over a database like PostgreSQL? In my current (semi-professional) project I'm also at the point to decide whether to go for a database, message broker-based (e.g. with RabbitMQ) or even a totally different solution. Let's imagine 2 tools, Tool A and Tool B. Whenever Tool A runs and finished, there might be something to do for Tool B. Execution of Tool A takes quiet some time (> 60 sec) and often there will be nothing to do for Tool B. Tool A provides some metadata for Tool B so Tool B knows what to do. Message-based solution: Establish a message queue which Tool B is consuming. In case Tool A was executed and Tool B should run, Tool A publishes a message (including the metadata) to the queue which Tool B receives so Tool B will run using the metadata from the message. Database solution: Whenever Tool A is running it adds a database record with e.g. timestamp, the metadata and status "RUNNING". In case Tool A was executed and Tool B should run, it updates the DB record status to "NEXT_TOOL_B". Tool B is constantly querying the DB for records with "NEXT_TOOL_B" status. In case it finds something, Tool B will run using the metadata from the DB records. While I'm aware of the disadvantages of the database solution e.g. the constant polling from Tool B, I miss one feature of it in the message-based solution: Whenever a 3rd Tool, say Tool C, e.g. a control panel UI, wants to know the current status it can also query the DB at any time and it will find a "RUNNING" status in case Tool A is still at work. In the message solution, I don't really see a way to "monitor" the status unless the finish message will be on the queue. So my question is, can you think way to achieve this using messages or any other method that gets along without polling?

The scenario described in the question is that of a system, which is composed of multiple different pieces which work together to achieve a function. In this case, you have three different processes <code>{A,B,C}</code>, together with a database and optional message queue. All systems, as part of their purpose of being, accept one or more inputs, execute some process, and produce one or more outputs. In your case, one of your outputs desired is the state of the system and its processing, which is not an altogether unreasonable thing to want to have. Queue or Database? Now, down to your question. Why use a message queue instead of a database? Both are similar components of a system in that they perform some storage capacity. You might well ask the same question in a refrigerator manufacturing plant- when does it make more sense to use a shelf on the assembly line as opposed to a warehouse? Databases are like warehouses - they are designed to hold a lot of different things and keep them all relatively straight. A good warehouse allows users to find things in the warehouse quickly, and avoids losing parts and materials. If it goes in, it can easily come back out, but not instantly. Message queues, on the other hand, are like the shelves located near the operator stations in an assembly line. Parts accumulate there from the previous operation waiting to be consumed by the person running the station. The shelves are designed to hold a small number of the same thing - just like a message queue in a software system. They are close to the worker, so when the next part is ready to be worked, it can be retrieved very quickly (as opposed to a trip to the warehouse, which can take several minutes or more). In addition, the worker has immediate visibility to what's on the shelf - if the shelf is empty, the worker might take a break and wait for it to accumulate a part or two again. Finally, if one part of the factory grossly over-produces (we don't like it when this happens, because it indicates waste), then the shelves are going to be overwhelmed, and the overage is going to need to be put into the warehouse. Believe it or not, this happens all the time in factories - sometimes stations go down for brief periods of time and the warehouse acts as a longer-term buffer. When to use one or the other? So - back to the question. You use a message queue when you expect that your production of messages will usually match the consumption of messages, and you need speed in retrieval. You don't expect things to stay around in the queue very long. Software queue systems, such as RabbitMq, also perform some very specific functions - like ensuring that a job only gets handled by one processor, and that it can get picked up by a different processor if the first goes down. On the other hand, you would use a database for things which require the persistence of state across multiple processing steps. Your job status is a perfect example of something that should be stored in the database. To continue the factory analogy - think of that as a report that gets sent back to the production planner when each step is completed. The production planner is going to keep it in a database. You would also want to use a database when there is a likelihood that the queue will get full, or when it's critical that data not get lost between one job step and another. For example, a manufacturing plant will often store its finished products in the warehouse pending shipment to the customer. Use a database for all long-term (more than a few seconds) storage needs in your application. Bottom Line Most scalable software systems will have a need for both queues and databases, and the key is knowing when to use each. Hopefully this makes some degree of sense.

Message broker vs. database and monitoring

Tags:

database

rabbitmq

message-queue

My question has some similarities to this question: Why do we need message brokers like RabbitMQ over a database like PostgreSQL?

In my current (semi-professional) project I'm also at the point to decide whether to go for a database, message broker-based (e.g. with RabbitMQ) or even a totally different solution.

Let's imagine 2 tools, Tool A and Tool B. Whenever Tool A runs and finished, there might be something to do for Tool B. Execution of Tool A takes quiet some time (> 60 sec) and often there will be nothing to do for Tool B. Tool A provides some metadata for Tool B so Tool B knows what to do.

Message-based solution: Establish a message queue which Tool B is consuming. In case Tool A was executed and Tool B should run, Tool A publishes a message (including the metadata) to the queue which Tool B receives so Tool B will run using the metadata from the message.

Database solution: Whenever Tool A is running it adds a database record with e.g. timestamp, the metadata and status "RUNNING". In case Tool A was executed and Tool B should run, it updates the DB record status to "NEXT_TOOL_B". Tool B is constantly querying the DB for records with "NEXT_TOOL_B" status. In case it finds something, Tool B will run using the metadata from the DB records.

While I'm aware of the disadvantages of the database solution e.g. the constant polling from Tool B, I miss one feature of it in the message-based solution:

Whenever a 3rd Tool, say Tool C, e.g. a control panel UI, wants to know the current status it can also query the DB at any time and it will find a "RUNNING" status in case Tool A is still at work. In the message solution, I don't really see a way to "monitor" the status unless the finish message will be on the queue.

So my question is, can you think way to achieve this using messages or any other method that gets along without polling?

980

asked Jan 04 '18 16:01

Clemens

1 Answers

The scenario described in the question is that of a system, which is composed of multiple different pieces which work together to achieve a function. In this case, you have three different processes {A,B,C}, together with a database and optional message queue. All systems, as part of their purpose of being, accept one or more inputs, execute some process, and produce one or more outputs. In your case, one of your outputs desired is the state of the system and its processing, which is not an altogether unreasonable thing to want to have.

Queue or Database?

Now, down to your question. Why use a message queue instead of a database? Both are similar components of a system in that they perform some storage capacity. You might well ask the same question in a refrigerator manufacturing plant- when does it make more sense to use a shelf on the assembly line as opposed to a warehouse?

Databases are like warehouses - they are designed to hold a lot of different things and keep them all relatively straight. A good warehouse allows users to find things in the warehouse quickly, and avoids losing parts and materials. If it goes in, it can easily come back out, but not instantly.

Message queues, on the other hand, are like the shelves located near the operator stations in an assembly line. Parts accumulate there from the previous operation waiting to be consumed by the person running the station. The shelves are designed to hold a small number of the same thing - just like a message queue in a software system. They are close to the worker, so when the next part is ready to be worked, it can be retrieved very quickly (as opposed to a trip to the warehouse, which can take several minutes or more). In addition, the worker has immediate visibility to what's on the shelf - if the shelf is empty, the worker might take a break and wait for it to accumulate a part or two again.

Finally, if one part of the factory grossly over-produces (we don't like it when this happens, because it indicates waste), then the shelves are going to be overwhelmed, and the overage is going to need to be put into the warehouse. Believe it or not, this happens all the time in factories - sometimes stations go down for brief periods of time and the warehouse acts as a longer-term buffer.

When to use one or the other?

So - back to the question. You use a message queue when you expect that your production of messages will usually match the consumption of messages, and you need speed in retrieval. You don't expect things to stay around in the queue very long. Software queue systems, such as RabbitMq, also perform some very specific functions - like ensuring that a job only gets handled by one processor, and that it can get picked up by a different processor if the first goes down.

On the other hand, you would use a database for things which require the persistence of state across multiple processing steps. Your job status is a perfect example of something that should be stored in the database. To continue the factory analogy - think of that as a report that gets sent back to the production planner when each step is completed. The production planner is going to keep it in a database.

You would also want to use a database when there is a likelihood that the queue will get full, or when it's critical that data not get lost between one job step and another. For example, a manufacturing plant will often store its finished products in the warehouse pending shipment to the customer. Use a database for all long-term (more than a few seconds) storage needs in your application.

Bottom Line

Most scalable software systems will have a need for both queues and databases, and the key is knowing when to use each.

Hopefully this makes some degree of sense.

answered Sep 22 '22 19:09

theMayer

Related questions
                            
                                Optimal database structure - 'wider' table with empty fields or greater number of tables?
                            
                                HSQLDB - which is the main database file
                            
                                Can Universal image loader for android work with images from sqlite db?
                            
                                Hibernate saveOrUpdate vs update vs save/persist
                            
                                Database index on a column with duplicate values
                            
                                Spark: optimise writing a DataFrame to SQL Server
                            
                                Using SQLITE with VB6
                            
                                How can I get my database under version control with Perl?
                            
                                Query has no destination for result data after trigger
                            
                                Is it wise to declare a VARCHAR with a value greater than 255 in MySQL?
                            
                                Select all items from a list in hibernate
                            
                                Would relational databases scale as well (or better) than their NoSQL counterparts if we drop the relationships?
                            
                                How to deal with a multiple-user database
                            
                                How to run arbitrary sql with mybatis?
                            
                                Default Value ON UPDATE Liquibase
                            
                                Query to create a new table to a specific database in SQL?
                            
                                Mysql No connection could be made because the target machine actively refused it
                            
                                Adding auto increment identity to existing table in oracle which is not empty
                            
                                No inverse to liquibase.change.core.RawSQLChange created
                            
                                Drop temporary table when exiting a function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With