Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RabbitMQ queue messages before writing to MongoDb

Application is sending logs from many machines to Amazon Cloud and store them in some database.

> Lets assume: one machine log size: 1kB every 10 seconds, num of machines from
1000 to 5000

My first approach was to queue logs in rabbitmq and then rabbitmq consumer would store them in sql database.

  1. Do I really need rabbitmq when consumer only do some basic storage operation?

Second approach was to queue logs in rabbitmq but store them in mongodb

  1. Is this make sense to queue messages before write to mongodb?
like image 596
userbb Avatar asked Apr 11 '15 08:04

userbb


2 Answers

Since you already have multiple producer systems creating logs, you already have a distributed architecture.

There are many benefits to decoupling a utility / cross cutting concern like logging from each of the systems, and instead using a queue:

  • By using an asynchronous approach, you will be able to buffer spikes of high volumes of messages in Rabbit, without impacting the throughput of the producer systems. Also, the centralized Log writing system may be able to batch the log inserts - bulk log message writes will require fewer database connections and can optimize IO beyond that which would be possible by a large number of servers each writing small numbers of logs directly.
  • It centralizes the concern of log writing. This way, you do not need to maintain the code to write logs on each producer, e.g. if the log format or the log storage changes (it already seems you have doubts on whether to store logs in NoSql like Mongo or Sql). This will be especially useful if the producer machines use different tech stacks (e.g. Java, Node, .Net) or different versions of the JVM etc. (You do however need to write to the queue from each system)
  • It decouples the availability of the producing system from the logging service (e.g. if the service writing the log data to MongoDb is down, logs can be queued in Rabbit until the system becomes available again). Remember to stamp the message creation time on the originating server, however.
  • It frees up IO and CPU resources on the producer systems.
  • Rabbit can form the basis of a bus architecture. This will allow you to extend the number of consumers of log messages, e.g. for redundancy, or e.g. to implement metrics, without impacting on the existing implementation at all.
like image 162
StuartLC Avatar answered Nov 03 '22 15:11

StuartLC


As stated by StuartLC, you need buffering and you need to decouples the availability of the producing system from the logging service.

Here is the cons against RabbitMQ:

  • RabbitMQ will be another point of failure to manage. If your logs are significant and/or have a high throughput you will have to make a cluster of RabbitMQ.
  • You will have to manage local buffering because RabbitMQ can be unavailable or because your producers are under flow control.
  • RabbitMQ does buffering but an healthy RabbitMQ is an empty one.

You do not define what you put under "log". As you state 1kB every 10 seconds, it seems to be metrics. Please correct me if I'm wrong.

Regarding logs handling, I tend to favor local buffering with a stack dedicated to logs handling: syslog, flume, logstash... Backed by a datastore with a high throughput. MongoDB should fit the need, I'm a bit skeptical about a RDBMS.

Whatever you may be able to implement local buffering with local RabbitMQ and federated queues.

like image 29
Nicolas Labrot Avatar answered Nov 03 '22 14:11

Nicolas Labrot