Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB (3.0) Aggregation: Several matches vs One match with multiple items

I am working on a project that requires me to create a dynamic MongoDB query on the fly based on a lot of matches (probably a potential of 100). On top of creating the proper indexes I was wondering if it mattered how I built the matches into the pipeline. Based on the following example, does one of these examples perform differently or better than the other?

I assume Example 2 would widdle down the result set but be more calls? Maybe that is what Example 1 is doing behind the scenes?

Thanks in advance for your help!

Example 1

db.Test.aggregate(
[     
   { $match: { item1: 'foo1', item2: 'foo2', item3: 'foo3' } }
])

vs

Example 2

db.Test.aggregate(
[     
   { $match: { item1: 'foo1' } },
   { $match: { item2: 'foo2' } },
   { $match: { item3: 'foo3' } }
])

I doubt it matter for this question but if relevant I will be using the C# driver for my implementation.

like image 592
Matt Avatar asked May 27 '15 07:05

Matt


2 Answers

I found the following information in MongoDB's documentation:

$match + $match Coalescence

When a $match immediately follows another $match, the two stages can coalesce into a single $match combining the conditions with an $and. For example, a pipeline contains the following sequence:

{ $match: { year: 2014 } },
{ $match: { status: "A" } }

Then the second $match stage can coalesce into the first $match stage and result in a single $match stage:

{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }

From this I can say that using several $match in a row is the same as using a single $match with multiple fields.

I am unsure however why does the optimization engine add a $and operator here. According to this answer it shouldn't be necessary, therefore I think it can be ignored. Somebody can confirm?

like image 142
Romain G Avatar answered Nov 16 '22 00:11

Romain G


I was wondering the same thing today and stumbled on the answer of Romain. Though I wanted to see this for myself, which can easily be done using an explain on both aggregates;

db.verpakking.explain().aggregate([
    { "$match": {type: "VERPAKT"} },
    { "$match": {ras: "CherryStar"} },
]);

Which results in the following output:

{
  "waitedMS" : NumberLong(0),
  "stages" : [
    {
      "$cursor" : {
        "query" : {
          "$and" : [
            {
              "type" : "VERPAKT"
            },
            {
              "ras" : "CherryStar"
            }
          ]
        },
        "queryPlanner" : {
          "plannerVersion" : NumberInt(1),
          "namespace" : "denberk.verpakking",
          "indexFilterSet" : false,
          "parsedQuery" : {
            "$and" : [
              {
                "ras" : {
                  "$eq" : "CherryStar"
                }
              },
              {
                "type" : {
                  "$eq" : "VERPAKT"
                }
              }
            ]
          },
          "winningPlan" : {
            "stage" : "COLLSCAN",
            "filter" : {
              "$and" : [
                {
                  "ras" : {
                    "$eq" : "CherryStar"
                  }
                },
                {
                  "type" : {
                    "$eq" : "VERPAKT"
                  }
                }
              ]
            },
            "direction" : "forward"
          },
          "rejectedPlans" : [

          ]
        }
      }
    }
  ],
  "ok" : NumberInt(1)
}

While

db.verpakking.explain().aggregate([
{ "$match": {type: "VERPAKT", ras: "CherryStar"} },
]);

Results in output:

{
  "waitedMS" : NumberLong(0),
  "stages" : [
    {
      "$cursor" : {
        "query" : {
          "type" : "VERPAKT",
          "ras" : "CherryStar"
        },
        "queryPlanner" : {
          "plannerVersion" : NumberInt(1),
          "namespace" : "denberk.verpakking",
          "indexFilterSet" : false,
          "parsedQuery" : {
            "$and" : [
              {
                "ras" : {
                  "$eq" : "CherryStar"
                }
              },
              {
                "type" : {
                  "$eq" : "VERPAKT"
                }
              }
            ]
          },
          "winningPlan" : {
            "stage" : "COLLSCAN",
            "filter" : {
              "$and" : [
                {
                  "ras" : {
                    "$eq" : "CherryStar"
                  }
                },
                {
                  "type" : {
                    "$eq" : "VERPAKT"
                  }
                }
              ]
            },
            "direction" : "forward"
          },
          "rejectedPlans" : [

          ]
        }
      }
    }
  ],
  "ok" : NumberInt(1)
}

As you can see this is exactly the same, except for the "query" portion (which is normal, since our query was different). This proves that whether you use two separate consecutive $match-pipelines or one combined $match-pipeline, the parsed query will be exactly the same.

like image 40
StefK Avatar answered Nov 15 '22 23:11

StefK