Use Case:
I'm consuming a REST Api which provides battle results of a video game. It is a team vs team online game and each team consists of 3 players who can pick different one from 100 different characters. I want to count the number of wins / losses and draws for each team combination. I get roughly 1000 battle results per second. I concatenate the character ids (ascending) of each team and then I save the wins/losses and draws for each combination.
My current implementation:
const combinationStatsSchema: Schema = new Schema({
combination: { type: String, required: true, index: true },
gameType: { type: String, required: true, index: true },
wins: { type: Number, default: 0 },
draws: { type: Number, default: 0 },
losses: { type: Number, default: 0 },
totalGames: { type: Number, default: 0, index: true },
battleDate: { type: Date, index: true, required: true }
});
For each returned log I perform an upsert and send these queries in bulk (5-30 rows) to MongoDB:
const filter: any = { combination: log.teamDeck, gameType, battleDate };
if (battleType === BattleType.PvP) {
filter.arenaId = log.arena.id;
}
const update: {} = { $inc: { draws, losses, wins, totalGames: 1 } };
combiStatsBulk.find(filter).upsert().updateOne(update);
My problem:
As long as I just have a few thousand entries in my collection combinationStats
mongodb takes just 0-2% cpu. Once the collection has a couple million documents (which happens pretty quickly due to the amount of possible combinations) MongoDB constantly takes 50-100% cpu. Apparently my approach is not scalable at all.
My question:
Either of these options could be a solution to my above defined problem:
Edit: After khang commented that it might be related to the upsert performance I replaced my $inc
with a $set
and indeed the performance was equally "poor". Hence I tried the suggested find()
and then manually update()
approach but the results didn't become any better.
Create a hash on your filter conditions:
I was able to reduce the CPU from 80-90% down to 1-5% and experienced a higher throughoutput.
Apparently the filter was the problem. Instead of filtering on these three conditions: { combination: log.teamDeck, gameType, battleDate }
I created a 128bit hash in my node application. I used this hash for upserting and set the combination, gameType and battleDate as additional fields in my update Document.
For creating the hash I used the metrohash library, which can be found here: https://github.com/jandrewrogers/MetroHash . Unfortunately I cannot explain why the performance is so much better, especially since I indexed all my previous conditions.
In (1.) you assert that you perform upserts in bulk. But based on how this seems to scale, you're probably sending too few rows into each batch. Consider doubling the batch size each time there's a doubling of stored rows. Please do post mongo's explain() query plan for your setup.
In (2.) you consider switching to, say, mysql or postgres. Yes, that would absolutely be a valid experiment. Again, be sure to post EXPLAIN output alongside your timing data.
There's only a million possible team compositions, and there's a distribution over those, with some being much more popular than others. You only need to maintain a million counters, which is not such a large number. However, doing 1e6 disk I/O's can take a while, especially if they are random reads. Consider moving away from a disk resident data structure, to which you might be doing frequent COMMITs, and switching to a memory resident hash or b-tree. It doesn't sound like ACID type of persistence guarantees are important to your application.
Also, once you have assembled "large" input batches, certainly more than a thousand and perhaps on the order of a million, do take care to sort the batch before processing. Then your counter maintenance problem just looks like merge-sort, either on internal memory or on external storage.
One principled approach to scaling your batches is to accumulate observations in some conveniently sized sorted memory buffer, and only release aggregate (counted) observations from that pipeline stage when number of distinct team compositions in the buffer is above some threshold K. Mongo or whatever would be the next stage in your pipeline. If K is much more than 1% of 1e6, then even a sequential scan of counters stored on disk would have a decent chance of finding useful update work to do on each disk block that is read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With