I am working on a project that requires me to create a dynamic MongoDB query on the fly based on a lot of matches (probably a potential of 100). On top of creating the proper indexes I was wondering if it mattered how I built the matches into the pipeline. Based on the following example, does one of these examples perform differently or better than the other?
I assume Example 2 would widdle down the result set but be more calls? Maybe that is what Example 1 is doing behind the scenes?
Thanks in advance for your help!
Example 1
db.Test.aggregate(
[
{ $match: { item1: 'foo1', item2: 'foo2', item3: 'foo3' } }
])
vs
Example 2
db.Test.aggregate(
[
{ $match: { item1: 'foo1' } },
{ $match: { item2: 'foo2' } },
{ $match: { item3: 'foo3' } }
])
I doubt it matter for this question but if relevant I will be using the C# driver for my implementation.
I found the following information in MongoDB's documentation:
$match
+$match
CoalescenceWhen a
$match
immediately follows another$match
, the two stages can coalesce into a single $match combining the conditions with an$and
. For example, a pipeline contains the following sequence:{ $match: { year: 2014 } }, { $match: { status: "A" } }
Then the second $match stage can coalesce into the first $match stage and result in a single $match stage:
{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
From this I can say that using several $match
in a row is the same as using a single $match
with multiple fields.
I am unsure however why does the optimization engine add a $and
operator here. According to this answer it shouldn't be necessary, therefore I think it can be ignored. Somebody can confirm?
I was wondering the same thing today and stumbled on the answer of Romain. Though I wanted to see this for myself, which can easily be done using an explain on both aggregates;
db.verpakking.explain().aggregate([
{ "$match": {type: "VERPAKT"} },
{ "$match": {ras: "CherryStar"} },
]);
Which results in the following output:
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"$and" : [
{
"type" : "VERPAKT"
},
{
"ras" : "CherryStar"
}
]
},
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "denberk.verpakking",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"ras" : {
"$eq" : "CherryStar"
}
},
{
"type" : {
"$eq" : "VERPAKT"
}
}
]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [
{
"ras" : {
"$eq" : "CherryStar"
}
},
{
"type" : {
"$eq" : "VERPAKT"
}
}
]
},
"direction" : "forward"
},
"rejectedPlans" : [
]
}
}
}
],
"ok" : NumberInt(1)
}
While
db.verpakking.explain().aggregate([
{ "$match": {type: "VERPAKT", ras: "CherryStar"} },
]);
Results in output:
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"type" : "VERPAKT",
"ras" : "CherryStar"
},
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "denberk.verpakking",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"ras" : {
"$eq" : "CherryStar"
}
},
{
"type" : {
"$eq" : "VERPAKT"
}
}
]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [
{
"ras" : {
"$eq" : "CherryStar"
}
},
{
"type" : {
"$eq" : "VERPAKT"
}
}
]
},
"direction" : "forward"
},
"rejectedPlans" : [
]
}
}
}
],
"ok" : NumberInt(1)
}
As you can see this is exactly the same, except for the "query" portion (which is normal, since our query was different). This proves that whether you use two separate consecutive $match-pipelines or one combined $match-pipeline, the parsed query will be exactly the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With