Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tracking aggregates in MongoDB in near real-time

Tags:

mongodb

We need to track record counts for records in an Accounts collection, based on a 'type' field. So we want to know how many Accounts are in TYPE1, how many are in TYPE2, etc... Further, we need to know the totals of a 'amount' field inside each Account.

Aggregate queries aren't going to be fast enough for us (these counts need to update in real-time in the UI and we will have 10s of millions of records, aggregate queries that take many seconds to run aren't going to cut it), so I'm looking at having a separate totals collection with an object that tracks counters for each type.

As we change the value of the 'type' field (i.e. move the account from one type to another), we need to adjust the counts and 'value' totals (decrement the counter for the original type, increment the counter for the new type). We can then use an update command with $incr() to adjust the fields in the totals record that stores the type counts and value sums. (This does mean that we have two database writes for every 'type' update, but I don't see a way around this unless anyone has a suggestion).

For single record adjustments, this is pretty straightforward - we can just trap the type change in our data access layer and make a secondary update in the totals tracking object.

The problem is how to track the 'amount' totals. For single record adjustments, this isn't a problem. But for bulk operations (db.collection.update() that could effect many thousands of records), we need to get the total of the 'amount' field for each of the adjusted records.

So far, I haven't been able to easily find a way to get Mongo to get the information that I need.

I've got one strategy worked out that involves adding a tagged history array in the Account object with a unique "changeId" and the 'amount' the document record had at the time of the change, then running an aggregate against that history record for the changeId to get the totals. Then optionally delete the history record (or do that in a periodic clean-up process).

For example, if I did a bulk change, I'd generate a unique ID ('aaaaaaaa' in the following), then do an array insert for a history record as part of the bulk update that adjusts the 'type':

{
  "amount": 123,
  "type": "TYPE1",
  "history": [
     {
       "changeId": "aaaaaaaaaa",
       "amount": 123,
       "oldType": "TYPE2",
       "newType": "TYPE1"
     }
  ]
}

Then I can do an aggregate that gives me the sum of the 'amount' for the changeId that just ran.

I think this will work, but it's clumsy - is there a better way?

like image 797
Kevin Day Avatar asked Feb 12 '18 23:02

Kevin Day


People also ask

How fast is MongoDB aggregation?

Aggregated:367 days.

Is aggregate faster than find in MongoDB?

Without seeing your data and your query it is difficult to answer why aggregate+sort is faster than find+sort. A well indexed(Indexing that suits your query) data will always yield faster results on your find query.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.

How do you count aggregates in MongoDB?

MongoDB $count Aggregation First, we invoke the $count operator and then specify the string. In this syntax, 'the_string' represents the label or the name of the output field. It must be non-empty and cannot start with a dollar sign '$' or dot '. ' character.


1 Answers

My first instinct is to store a change log in a separate collection but I don't see a way to do that in the MongoDB bulk collection documentation. I agree that the work of maintaining the aggregates needs to be in a separate process. Your idea of creating a history array in the account collection can work. I don't know your application but I would slightly change the structure to avoid a timing hole. I'd create a ticker tape of changes the aggregate process can apply with little knowledge of the account.

{
 "amount": active amount,
 "type":  active type,
 "history" [
   {
     "changeId": "aaaaaaaa",
     "NewType": 1    
     "amount": new amount
   },
   {
     "changeId": "aaaaaaaa",
     "OldType":  -1
     "amount": old amount as a negative value     
   }
  ]
}   

The reason is because of the timing of the aggregate collection process. Using your original structure, it has to get the new amount from the account itself. But what if the account changes again before the aggregate collection process runs. Say, the transactions are as follows:

Type1 2000
Changes to Type2 3000
Changes to Type1 1000 

Using the following structure, your aggregate process has to figure out to ignore the type 2 change because it cancels itself out.

{
  "amount": 1000
  "type":   Type1
  "history" [
   {
     "changeID": "aaaaaa",
     "amount": 2000
     "oldtype": Type1
     "newtype": Type2
   },
   { 
     "changeID": "bbbbbb",
     "amount": 3000
     "oldtype": Type2
     "newtype": Type1
   }
 ]
}

I'd do the following. The aggregate process is going to find all Type1 records in the history and perform the aggregate. So for Type1, it is going to sum 1 and -1 for no difference in the count and -2000 and 1000 to decrement the Type1 amount by 1000. The Type2 aggregate will cancel out.

{
"amount": 1000
"type":   Type1
"history" [
    {
      "changeID": "aaaaaa",
      "Type1":  -1
      "amount": -2000
    },
   {
     "changeID": "aaaaaa",
     "Type2":  1
     "amount": 3000
   },
   {
     "changeID": "bbbbbb",
     "Type2":  -1
     "amount": -3000
   },
   {
     "changeID": "bbbbbb",
     "Type1":  1
     "amount": 1000
   }
 ]
}

No matter what you choose to do, you'll need to determine which history records have already been processed. You can either delete the history documents once processed, flag them, or move them to an audit collection.

like image 154
LAS Avatar answered Oct 19 '22 23:10

LAS