We need to track record counts for records in an Accounts collection, based on a 'type' field. So we want to know how many Accounts are in TYPE1, how many are in TYPE2, etc... Further, we need to know the totals of a 'amount' field inside each Account. Aggregate queries aren't going to be fast enough for us (these counts need to update in real-time in the UI and we will have 10s of millions of records, aggregate queries that take many seconds to run aren't going to cut it), so I'm looking at having a separate totals collection with an object that tracks counters for each type. As we change the value of the 'type' field (i.e. move the account from one type to another), we need to adjust the counts and 'value' totals (decrement the counter for the original type, increment the counter for the new type). We can then use an update command with $incr() to adjust the fields in the totals record that stores the type counts and value sums. (This does mean that we have two database writes for every 'type' update, but I don't see a way around this unless anyone has a suggestion). For single record adjustments, this is pretty straightforward - we can just trap the type change in our data access layer and make a secondary update in the totals tracking object. The problem is how to track the 'amount' totals. For single record adjustments, this isn't a problem. But for bulk operations (db.collection.update() that could effect many thousands of records), we need to get the total of the 'amount' field for each of the adjusted records. So far, I haven't been able to easily find a way to get Mongo to get the information that I need. I've got one strategy worked out that involves adding a tagged history array in the Account object with a unique "changeId" and the 'amount' the document record had at the time of the change, then running an aggregate against that history record for the changeId to get the totals. Then optionally delete the history record (or do that in a periodic clean-up process). For example, if I did a bulk change, I'd generate a unique ID ('aaaaaaaa' in the following), then do an array insert for a history record as part of the bulk update that adjusts the 'type': <pre class="prettyprint"><code>{ "amount": 123, "type": "TYPE1", "history": [ { "changeId": "aaaaaaaaaa", "amount": 123, "oldType": "TYPE2", "newType": "TYPE1" } ] } </code></pre> Then I can do an aggregate that gives me the sum of the 'amount' for the changeId that just ran. I think this will work, but it's clumsy - is there a better way?

My first instinct is to store a change log in a separate collection but I don't see a way to do that in the MongoDB bulk collection documentation. I agree that the work of maintaining the aggregates needs to be in a separate process. Your idea of creating a history array in the account collection can work. I don't know your application but I would slightly change the structure to avoid a timing hole. I'd create a ticker tape of changes the aggregate process can apply with little knowledge of the account. <pre class="prettyprint"><code>{ "amount": active amount, "type": active type, "history" [ { "changeId": "aaaaaaaa", "NewType": 1 "amount": new amount }, { "changeId": "aaaaaaaa", "OldType": -1 "amount": old amount as a negative value } ] } </code></pre> The reason is because of the timing of the aggregate collection process. Using your original structure, it has to get the new amount from the account itself. But what if the account changes again before the aggregate collection process runs. Say, the transactions are as follows: <pre class="prettyprint"><code>Type1 2000 Changes to Type2 3000 Changes to Type1 1000 </code></pre> Using the following structure, your aggregate process has to figure out to ignore the type 2 change because it cancels itself out. <pre class="prettyprint"><code>{ "amount": 1000 "type": Type1 "history" [ { "changeID": "aaaaaa", "amount": 2000 "oldtype": Type1 "newtype": Type2 }, { "changeID": "bbbbbb", "amount": 3000 "oldtype": Type2 "newtype": Type1 } ] } </code></pre> I'd do the following. The aggregate process is going to find all Type1 records in the history and perform the aggregate. So for Type1, it is going to sum 1 and -1 for no difference in the count and -2000 and 1000 to decrement the Type1 amount by 1000. The Type2 aggregate will cancel out. <pre class="prettyprint"><code>{ "amount": 1000 "type": Type1 "history" [ { "changeID": "aaaaaa", "Type1": -1 "amount": -2000 }, { "changeID": "aaaaaa", "Type2": 1 "amount": 3000 }, { "changeID": "bbbbbb", "Type2": -1 "amount": -3000 }, { "changeID": "bbbbbb", "Type1": 1 "amount": 1000 } ] } </code></pre> No matter what you choose to do, you'll need to determine which history records have already been processed. You can either delete the history documents once processed, flag them, or move them to an audit collection.

Tracking aggregates in MongoDB in near real-time

Tags:

mongodb

We need to track record counts for records in an Accounts collection, based on a 'type' field. So we want to know how many Accounts are in TYPE1, how many are in TYPE2, etc... Further, we need to know the totals of a 'amount' field inside each Account.

Aggregate queries aren't going to be fast enough for us (these counts need to update in real-time in the UI and we will have 10s of millions of records, aggregate queries that take many seconds to run aren't going to cut it), so I'm looking at having a separate totals collection with an object that tracks counters for each type.

As we change the value of the 'type' field (i.e. move the account from one type to another), we need to adjust the counts and 'value' totals (decrement the counter for the original type, increment the counter for the new type). We can then use an update command with $incr() to adjust the fields in the totals record that stores the type counts and value sums. (This does mean that we have two database writes for every 'type' update, but I don't see a way around this unless anyone has a suggestion).

For single record adjustments, this is pretty straightforward - we can just trap the type change in our data access layer and make a secondary update in the totals tracking object.

The problem is how to track the 'amount' totals. For single record adjustments, this isn't a problem. But for bulk operations (db.collection.update() that could effect many thousands of records), we need to get the total of the 'amount' field for each of the adjusted records.

So far, I haven't been able to easily find a way to get Mongo to get the information that I need.

I've got one strategy worked out that involves adding a tagged history array in the Account object with a unique "changeId" and the 'amount' the document record had at the time of the change, then running an aggregate against that history record for the changeId to get the totals. Then optionally delete the history record (or do that in a periodic clean-up process).

For example, if I did a bulk change, I'd generate a unique ID ('aaaaaaaa' in the following), then do an array insert for a history record as part of the bulk update that adjusts the 'type':

{
  "amount": 123,
  "type": "TYPE1",
  "history": [
     {
       "changeId": "aaaaaaaaaa",
       "amount": 123,
       "oldType": "TYPE2",
       "newType": "TYPE1"
     }
  ]
}

Then I can do an aggregate that gives me the sum of the 'amount' for the changeId that just ran.

I think this will work, but it's clumsy - is there a better way?

797

asked Feb 12 '18 23:02

Kevin Day

1 Answers

My first instinct is to store a change log in a separate collection but I don't see a way to do that in the MongoDB bulk collection documentation. I agree that the work of maintaining the aggregates needs to be in a separate process. Your idea of creating a history array in the account collection can work. I don't know your application but I would slightly change the structure to avoid a timing hole. I'd create a ticker tape of changes the aggregate process can apply with little knowledge of the account.

{
 "amount": active amount,
 "type":  active type,
 "history" [
   {
     "changeId": "aaaaaaaa",
     "NewType": 1    
     "amount": new amount
   },
   {
     "changeId": "aaaaaaaa",
     "OldType":  -1
     "amount": old amount as a negative value     
   }
  ]
}

The reason is because of the timing of the aggregate collection process. Using your original structure, it has to get the new amount from the account itself. But what if the account changes again before the aggregate collection process runs. Say, the transactions are as follows:

Type1 2000
Changes to Type2 3000
Changes to Type1 1000

Using the following structure, your aggregate process has to figure out to ignore the type 2 change because it cancels itself out.

{
  "amount": 1000
  "type":   Type1
  "history" [
   {
     "changeID": "aaaaaa",
     "amount": 2000
     "oldtype": Type1
     "newtype": Type2
   },
   { 
     "changeID": "bbbbbb",
     "amount": 3000
     "oldtype": Type2
     "newtype": Type1
   }
 ]
}

I'd do the following. The aggregate process is going to find all Type1 records in the history and perform the aggregate. So for Type1, it is going to sum 1 and -1 for no difference in the count and -2000 and 1000 to decrement the Type1 amount by 1000. The Type2 aggregate will cancel out.

{
"amount": 1000
"type":   Type1
"history" [
    {
      "changeID": "aaaaaa",
      "Type1":  -1
      "amount": -2000
    },
   {
     "changeID": "aaaaaa",
     "Type2":  1
     "amount": 3000
   },
   {
     "changeID": "bbbbbb",
     "Type2":  -1
     "amount": -3000
   },
   {
     "changeID": "bbbbbb",
     "Type1":  1
     "amount": 1000
   }
 ]
}

No matter what you choose to do, you'll need to determine which history records have already been processed. You can either delete the history documents once processed, flag them, or move them to an audit collection.

154

answered Oct 19 '22 23:10

LAS

Related questions
                            
                                Moped::Errors::OperationFailure failed with error "no such cmd
                            
                                Mongodb - sharded and unsharded collections
                            
                                MongoDB aggregate return count of 0 if no results
                            
                                Mongo connections never released - Django and Mongoengine running on gunicorn with gevent
                            
                                Elasticsearch deployment in a 2 server load balanced node js application setting
                            
                                MongoDB Chat Schema
                            
                                MongoDB (3.0.2) NullPointerException with Grails 2.4.3
                            
                                MongoError: attempt to write outside buffer bounds
                            
                                MongoDB and using DBRef with Spatial Data
                            
                                MongoCursorTimeoutException in jenssegers/laravel-mongodb
                            
                                Catch MongoAuthenticationException in Mongo .NET 2.0 Driver
                            
                                Ionic framework app work offline and syncing data with mongoDB
                            
                                Vnext UserManager exist in both libraries (ambiguous reference)
                            
                                Long observeChanges call during login
                            
                                Mongo update response says no document updated, but the document is there
                            
                                Why observing oplog takes so much time in meteor / mongo?
                            
                                Store Image file in Binary data in mongoose schema and Display image in html form
                            
                                How to restore Mongo(WT engine) only with collection-0-****.wt file?
                            
                                How to group Mongodb time point data into contiguous time groups?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With