Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDb Streaming Out Inserted Data in Real-time (or near real-time)

I have a number of MongoDB collections which take a number of JSON documents from various streaming sources. In other-words there a a number of processes which are continually inserting data into a set of MongoDB collections.

I need a way to stream the data out of MongoDB into downstream applications. So I want an system that conceptually looks like this:

App Stream1 --> 
App Stream2 -->     MONGODB     --->  Aggregated Stream
App Stream3 -->

OR this:

App Stream1 -->                 --->  MongoD Stream1
App Stream2 -->     MONGODB     --->  MongoD Stream2
App Stream3 -->                 --->  MongoD Stream3

The question is how do I stream data out of Mongo without having to continually poll/query the database?

The obvious question answer would be "why dont you change those app streaming processes to send messages to a Queue like Rabbit, Zero or ActiveMQ which then has them send to your Mongo Streaming processes and Mongo at once like this":

                 MONGODB
                   /|\  
                    |
App Stream1 -->     |          --->  MongoD Stream1
App Stream2 -->  SomeMQqueue   --->  MongoD Stream2
App Stream3 -->                --->  MongoD Stream3

In an ideal world yes that would be good, but we need Mongo to ensure that messages are saved first, to avoid duplicates and ensure that IDs are all generated etc. Mongo has to sit in the middle as the persistence layer.

So how do I stream messages out of a Mongo collection (not using GridFS etc) into these down stream apps. The basic school of thought has been to just poll for new documents and each document that is collected update it by adding another field to the JSON documents stored in the database, much like a process flag in a SQL table that stores a processed time stamp. I.e. every 1 second poll for documents where processed == null.... add processed = now().... update document.

Is there a neater/more computationally efficient method?

FYI - These are all Java processes.

Cheers!

like image 439
NightWolf Avatar asked Aug 24 '11 04:08

NightWolf


People also ask

Is MongoDB good for real time?

With MongoDB, businesses can analyze any data in place and deliver insights in real time.

How does MongoDB stream work?

MongoDB Change Streams track real-time data changes across a database, a collection, or an entire deployment, allowing you to immediately react to these changes. It gives users the power to track changes without having to continuously monitor the operations log (oplog).

How does MongoDB change stream work?

Change streams transform a MongoDB database into a real-time database by taking advantage of MongoDB's replication process. They monitor replication in MongoDB, providing an API for external applications that require real-time data without the risk involved in tailing the oplog or the overhead that comes with polling.

Does MongoDB have streams?

Starting in MongoDB 4.0, you can open a change stream cursor for a deployment (either a replica set or a sharded cluster) to watch for changes to all non-system collections across all databases except for admin , local , and config .


1 Answers

If you are writing to a capped collection (or collections), you can use a tailablecursor to push new data on the stream, or on a message queue from where it can be streamed out. However this will not work for a non-capped collection though.

like image 81
lobster1234 Avatar answered Oct 12 '22 23:10

lobster1234