Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between a changeStream and tailable cursor in MongoDB

I am trying to determine what the difference is between a changestream: https://docs.mongodb.com/manual/changeStreams https://docs.mongodb.com/manual/reference/method/db.collection.watch/

which looks like so:

const changeStream = collection.watch();
changeStream.next(function(err, next) {
  expect(err).to.equal(null);
  client.close();
  done();
});

and a tailable cursor: https://docs.mongodb.com/manual/core/tailable-cursors/

which looks like so:

 const cursor = coll.find(self.query || query)
  .addCursorFlag('tailable', true)
  .addCursorFlag('awaitData', true)  // true or false?
  .addCursorFlag('noCursorTimeout', true)
  .addCursorFlag('oplogReplay', true)
  .setCursorOption('numberOfRetries', Number.MAX_VALUE)
  .setCursorOption('tailableRetryInterval', 200);


 const strm = cursor.stream();   // Node.js transform stream

do they have a different use case? when would it be good to use one over the other?

like image 634
Alexander Mills Avatar asked Mar 18 '18 00:03

Alexander Mills


2 Answers

Change Streams (available in MongoDB v3.6+) is a feature that allows you to access real-time data changes without the complexity and risk of tailing the oplog. Key benefits of change streams over tailing the oplog are:

  1. Utilise the built-in MongoDB Role-Based Access Control. Applications can only open change streams against collections they have read access to. Refined and specific authorisation.

  2. Provide a well defined API that are reliable. The change events output that are returned by change streams are well documented. Also, all of the official MongoDB drivers follow the same specifications when implementing change streams interface.

  3. Change events that are returned as part of change streams are at least committed to the majority of the replica set. This means the change events that are sent to the client are durable. Applications don't need to handle data rollback in the event of failover.

  4. Provide a total ordering of changes across shards by utilising a global logical clock. MongoDB guarantees the order of changes are preserved and change events can be safely interpreted in the order received. For example, a change stream cursor opened against a 3-shard sharded cluster returns change events respecting the total order of those changes across all three shards.

  5. Due to the ordering characteristic, change streams are also inherently resumable. The _id of change event output is a resume token. MongoDB official drivers automatically cache this resume token, and in the case of network transient error the driver will retry once. Additionally, applications can also resume manually by utilising parameter resume_after. See also Resume a Change Stream.

  6. Utilise MongoDB aggregation pipeline. Applications can modify the change events output. Currently there are five pipeline stages available to modify the event output. For example, change event outputs can be filtered out (server side) before being sent out using $match stage. See Modify Change Stream Output for more information.

when would it be good to use one over the other?

If your MongoDB deployment is version 3.6+, I would recommend to utilise MongoDB Change Streams over tailing the oplog.

You may also find Change Streams Production Recommendations a useful resource.

like image 84
Wan Bachtiar Avatar answered Sep 24 '22 08:09

Wan Bachtiar


With tailable cursor, you follow ALL changes to all collections. With changeStream, you see only changes to the selected collection. Much less traffic and more reliable.

like image 40
JJussi Avatar answered Sep 22 '22 08:09

JJussi