Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Messaging bus + event storage + PubSub

I'm looking at building an application which has many data sources, each of which put events into my system. Events have a well defined data structure and could be encoded using JSON or XML.

I would like to be able to guarantee that events are saved persistently, and that the events are used as a part of a publish/subscribe bus with multiple subscribers possible per event.

For the database, availability is very important even as it scales to multiple nodes, and partition tolerance is important so that I can scale the number of places which can store my events. Eventual consistency is good enough for me.

I was thinking of using a JMS enterprise messaging bus (e.g. Mule) or an AMQP enterprise messaging bus (such as RabbitMQ or ZeroMQ).

But for my application, it seems that if I could set up a publish subscribe system with CouchDB or something similar, it would solve my problem without having to integrate a enterprise messaging bus and a persistent storage system.

Which would work better, CouchDB + scaling + loadbalancing + some kind of PubSub mechanism, or an explicit PubSub messaging system with attached eventually-consistent , Available, partition-tolerant storage? Which one is easier to set up, administer, and operate? Which solution will have high throughput for a given cost? Why?

Also, are there any more questions I should ask before selecting my technologies? (BTW, Java is the server-side and client-side language).

like image 809
Jay Godse Avatar asked Dec 28 '22 18:12

Jay Godse


1 Answers

I am using a CouchDB message queue in production. (It is not pub/sub, so I do not consider this answer complete.)

Currently (June 2011), CouchDB has huge potential as a messaging substrate:

  1. Good data persistence
  2. Well-poised for clustering (on a LAN, using BigCouch or Lounge)
  3. Well-poised for distribution (between data centers, world-wide)
  4. Good platform. Despite the shortcomings listed below, I love CQS because I can re-use my DB and it works from Erlang, NodeJS, and every web browser.
  5. The _changes query
    1. Continuous feeds, instant delivery without polling
    2. Network going down is no problem, just retry later from the previous position

Still, even a low-volume message system in CouchDB requires careful planning and maintenance. CouchDB is potentially a great messaging server. (It is inspired by Lotus notes, which handles high email volume.)

However, these are the challenges with CouchDB:

  1. Append-only database files grow fast
    1. Be mindful about disk capacity
    2. Be mindful about disk i/o. Compaction will read and re-write all live documents
  2. Deleted documents are not really deleted. They are marked deleted=true and kept forever, even after compaction! This is in fact uniquely good about CouchDB, because the deleted action will propagate through the cluster, even if the network goes down for a time.
  3. Propagating (replicating) deletes is great, but what about the buildup of deleted docs? Eventually it will outstrip everything else. The solution is to purge them, which actually removes them from disk. Unfortunately, if you do 2 or more purges before querying a map/reduce view, the view will completely rebuild itself. That may take too much time, depending on your needs.

As usual, we hear NoSQL databases shouting "free lunch!", "free lunch!" while CouchDB says "you are going to have to work for this."

Unfortunately, unless you have compelling pressure to re-use CouchDB, I would use a dedicated messaging platform. I had a good experience with ejabberd as a messaging platform and to communicate to/from Google App Engine.)

like image 85
JasonSmith Avatar answered Feb 06 '23 04:02

JasonSmith