Mongo Change Streams running multiple times (kind of): Node app running multiple instances

Question

My Node app uses Mongo change streams, and the app runs 3+ instances in production (more eventually, so this will become more of an issue as it grows). So, when a change comes in the change stream functionality runs as many times as there are processes.

How to set things up so that the change stream only runs once?

Here's what I've got:

const options = { fullDocument: "updateLookup" };

const filter = [
  {
    $match: {
      $and: [
        { "updateDescription.updatedFields.sites": { $exists: true } },
        { operationType: "update" }
      ]
    }
  }
];

const sitesStream = Client.watch(sitesFilter, options);

// Start listening to site stream
sitesStream.on("change", async change => {
  console.log("in site change stream", change);
  console.log(
    "in site change stream, update desc",
    change.updateDescription
  );

  // Do work...
  console.log("site change stream done.");
  return;
});

Marc · Accepted Answer

It can easily be done with only Mongodb query operators. You can add a modulo query on the ID field where the divisor is the number of your app instances (N). The remainder is then an element of {0, 1, 2, ..., N-1}. If your app instances are numbered in ascending order from zero to N-1 you can write the filter like this:

const filter = [
  {
    "$match": {
      "$and": [
        // Other filters
        { "_id": { "$mod": [<number of instances>, <this instance's id>]}}
      ]
    }
  }
];

Alec Henninger · Answer

Doing this with strong guarantees is difficult but not impossible. I wrote about the details of one solution here: https://www.alechenninger.com/2020/05/building-kafka-like-message-queue-with.html

The examples are in Java but the important part is the algorithm.

It comes down to a few techniques:

Each process attempts to obtain a lock
Each lock (or each change) has an associated fencing token
Processing each change must be idempotent
While processing the change, the token is used to ensure ordered, effectively-once updates.

More details in the blog post.

jteezy14 · Answer

It sounds like you need a way to partition updates between instances. Have you looked into Apache Kafka? Basically what you would do is have a single application that writes the change data to a partitioned Kafka Topic and have your node application be a Kafka consumer. This would ensure only one application instance ever receives an update.

Depending on your partitioning strategy, you could even ensure that updates for the same record always go to the same node app (if your application needs to maintain its own state). Otherwise, you can spread out the updates in a round robin fashion.

The biggest benefit to using Kafka is that you can add and remove instances without having to adjust configurations. For example, you could start one instance and it would handle all updates. Then, as soon as you start another instance, they each start handling half of the load. You can continue this pattern for as many instances as there are partitions (and you can configure the topic to have 1000s of partitions if you want), that is the power of the Kafka consumer group. Scaling down works in the reverse.

Mongo Change Streams running multiple times (kind of): Node app running multiple instances

Tags:

node.js

mongodb

changestream

PaulIsLoud

3 Answers

Marc

Alec Henninger

jteezy14

Recent Activity

Donate For Us

Mongo Change Streams running multiple times (kind of): Node app running multiple instances

Tags:

node.js

mongodb

changestream

PaulIsLoud

3 Answers

Marc

Alec Henninger

jteezy14

Related questions

Recent Activity

Donate For Us