I have a collection named "logTransaction". I want to get the results as you can see in the attached image.
logTransaction has many fields but the ones used for this image are:
customer
, environment
, firstTime
, lastTime
, integrationIds[]
(a transaction can have more than 1 integration), transactionStatus
(FINISHED, UNFINISHED, FAILED)
I am using AggregationOutput
for this result but it is taking more than 30 seconds which is much longer (I think) than the amount of data I have. I just wonder if I can improve this by modifing what I already have or should
I change it totally. What type of indexing should I use to make things even faster?
I use MongoDB
and Grails
. My current method looks like this:
def myCustomAggregation(integrations, timestamp_lt, timestamp_gt, cust, env) {
def currentRequest = RequestContextHolder.requestAttributes
def customer = cust ?: currentRequest?.session?.customer
def environment = env ?: currentRequest?.session?.environment
//$match
DBObject matchMap = new BasicDBObject('integrationIds', new BasicDBObject('$in', integrations.collectAll { it?.baselineId }))
matchMap.put("firstTimestamp", new BasicDBObject('$lte', timestamp_lt as Long).append('$gte', timestamp_gt as Long))
matchMap.put("customer",customer)
matchMap.put("environment",environment)
DBObject match = new BasicDBObject('$match',matchMap);
//$group1
Map<String, Object> dbObjIdMap1 = new HashMap<String, Object>();
dbObjIdMap1.put('integrationId', '$integrationIds');
dbObjIdMap1.put('transactionStatus', '$transactionStatus');
DBObject groupFields1 = new BasicDBObject( "_id", new BasicDBObject(dbObjIdMap1));
groupFields1.put('total', new BasicDBObject( '$sum', 1));
DBObject group1 = new BasicDBObject('$group', groupFields1);
//$group2
DBObject groupFields2 = new BasicDBObject( "_id", '$_id.integrationId');
groupFields2.put('total_finished',
new BasicDBObject('$sum', new BasicDBObject('$cond', [
new BasicDBObject('$eq', ['$_id.transactionStatus', 'FINISHED']), '$total', 0
]))
);
groupFields2.put('total_unfinished',
new BasicDBObject('$sum', new BasicDBObject('$cond', [
new BasicDBObject('$eq', ['$_id.transactionStatus', 'UNFINISHED']), '$total', 0
]))
);
groupFields2.put('total_failed',
new BasicDBObject('$sum', new BasicDBObject('$cond', [
new BasicDBObject('$eq', ['$_id.transactionStatus', 'FAILED']), '$total', 0
]))
);
DBObject group2 = new BasicDBObject('$group', groupFields2);
// This taking more than 30 seconds. Its too much for the amount of data I have in Database.
AggregationOutput output = db.logTransaction.aggregate(match,group1,group2)
return output.results()
}
Edit:
I created a compound index as HoefMeistert suggested:
db.logTransaction.createIndex({integrationIds: 1, firstTimestamp: -1, customer: 1, environment: 1})
But when I use explain on this aggregate:
db.logTransaction.explain().aggregate( [
{ $match: {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}, "firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}, "customer": "Awsome_Company", "environment": "PROD"}},
{ $group: { _id: {"integrationId": '$integrationIds', "transactionStatus": '$transactionStatus'}, total: {$sum: 1}}},
{ $group: { _id: "$_id.integrationId", "total_finished": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "FINISHED"]}, "$total", 0]}}, "total_unfinished": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "UNFINISHED"]}, "$total", 0]}}, "total_failed": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "FAILED"]}, "$total", 0]}}}}
]);
I still get this winningPlan every single time:
"winningPlan" : {
"stage" : "CACHED_PLAN",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"environment" : {
"$eq" : "PROD"
}
},
{
"integrationIds" : {
"$in" : [
"INT010",
"INT011",
"INT012A",
"INT200"
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"tenant" : 1,
"firstTimestamp" : -1
},
"indexName" : "customer_1_firstTimestamp_-1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"customer" : [
"[\"Awsome_Company\", \"Awsome_Company\"]"
],
"firstTimestamp" : [
"[1476107324000.0, 1470002400000.0]"
]
}
}
}
},
Current indexes for the collection in development env. and the speed is good compared to before but when timespan is greater than 1 week, I still get sockettimeoutexception (3 minutes):
"customer_1_firstTimestamp_-1" : 56393728,
"firstTimestamp_-1_customer_1" : 144617472,
"integrationIds_1_firstTimestamp_-1" : 76644352,
"integrationId_1_firstTimestamp_-1" : 56107008,
"transactionId_1_firstTimestamp_-1" : 151429120,
"firstTimestamp_1" : 56102912,
"transactionId_1" : 109445120,
"integrationIds_1_firstTimestamp_-1_customer_1_environment_1" : 247790976
What indexes you currently have? When i look at your aggregation make sure you have a indexes on the field you are matching on:
After the first (match) stage indexes are no longer relevant. As asked by elixir, how is the performance in shell / editor? Is it also slow there. If so try find the "slow" stage.
Update: you can also help the Aggregation Pipeline optimizer ;-) Rewrite the match to a single $and match
{ $match: {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}, "firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}, "customer": "Awsome_Company", "environment": "PROD"}}
to:
{ $match: { $and : [
{integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}},
{"firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}},
{"customer": "Awsome_Company"},
{"environment": "PROD"}]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With