Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB ChangeStream performance

Is it possible to use change stream for extensive use? I want to watch many collections with many documents with various parameters. The idea is to allow for multiple users to watch data that they are interested in. So not only to show few real-time updates on e.g. some stock data from a single collection or whatever, but to allow a modern web application to be real-time. I've stumbled upon some discussions e.g. this one which suggests, that the feature is not usable for such purpose.

So imagine implementing commonly known social network. Each user would want to have live data on (1) notifications, (2) online friends, (3) friends requests, (4) news feed, (5) comments on news feed posts (maybe one for each post?). This makes at least 5 open change streams per user. If a service would have connected e.g. 10000 users, it makes 50000 active change streams.

Is this mechanism ready for such load? If I understood the discussion (and some others) every change stream watcher creates one connection. Would it be okay to have like tens of thousands of connections? It does not seems like a good design. It seems like it'd be better to watch each collection and do the filtering on a application server, but that is more of a database server's job.

Is there way how to handle such load with mongo db?

like image 720
Márius Rak Avatar asked Sep 17 '20 23:09

Márius Rak


1 Answers

Each change stream will require a connection to the server. Assuming your 10000 active users are going to do things like login, post things, read things, comment on other people's things, manage friend lists, etc. you may actually be needing more like 10 connections per user.

Each change stream is essentially an aggregation the maintains a cursor over the operations log. That should work fairly well as long as the server is sufficiently sized to handle:

  • 100,000 simultaneous connections
  • state for 50,000 long running cursors
  • 10s of thousands of queries per second for those change streams
  • whatever query rate the other non-changestream reads and writes will need

On MongoDB Atlas you would need at least an M140 instance just to handle that number of connections, with a price tag in the neighborhood of $10K per month.

At that price point, it would probably be more cost effective to design a pub/sub notification service that uses a total of 5 change streams to watch for the different types of changes, and deliver those to users with a push mechanism rather than having every user poll the database directly.

like image 142
Joe Avatar answered Oct 23 '22 05:10

Joe