Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consequences of using $unwind on nested arrays?

Tags:

mongodb

Say I have 17,000 documents that have a structure similar to the document below:

{
   someInfo: "blah blah blah",
   // and another dozen or so attributes here, followed by:
   answers:[
      {
          email: "[email protected],
          values:[
             {value: 1, label: "test1"},
             {value: 2, label: "test2"}    
          ]
      },
      {
          email: "[email protected],
          values:[
             {value: 6, label: "test1"},
             {value: 1, label: "test2"}    
          ]
      }
   ]
}

Say I use aggregate to unwind both answers and answers.values like so:

db.participants.aggregate(
   {$unwind: "$answers"},
   {$unwind: "$answers.values"}
);

I assume it would create a fairly large result set in memory since it would essentially be replicating the parent object 17,000 * # of answers * # of values times.

I have been testing a query that does something similar on a development environment and the performance of the query itself is fine, but I'm wondering if I should be concerned about running this on a production environment where the unwound result set could potentially eat up a lot of memory. Mongo's documentation on $unwind goes into how it works, but does not discuss potential performance problems.

Should I be worried about doing this on a production system? Will it slow down other queries against the db?

like image 268
Abe Miessler Avatar asked Sep 29 '22 02:09

Abe Miessler


1 Answers

It is always a good idea to be cognizant of memory resources when $unwinding because of the replication of data that occurs.

Using $match to narrow down the results to the specific documents you are looking for is of course one way to reduce the amount of memory necessary to hold the returned data.

Another way to reduce the memory footprint is with $project. $project allows you to re-organize the documents in the pipeline so that you only return the elements in which you are interested.

To use your example,

{
  someInfo: "blah blah blah",
  answers: [
    {
      email: "[email protected]",
      values: [
        {value: 1, label: "test1"},
        {value: 2, label: "test2"}    
      ]
    },
    {
      email: "[email protected]",
      values: [
        {value: 6, label: "test1"},
        {value: 1, label: "test2"}    
      ]
    }
  ]
}

With

db.collection.aggregate([{ $match: { <element>: <value> }}, { $project: { _id: 0, answers: 1}}])

will remove the someInfo and other attributes you may not be interested in. Then you could $project again after unwinding...

db.collection.aggregate([
   { $match: { <element>: <value> }},
   { $project: { _id: 0, answers: 1}},
   { $unwind: "$answers"},
   { $unwind: "$answers.tags"},
   { $project: { e: "$answers.email", v: "$answers.values"}}
])

will return fairly compact results like:

{ e: "[email protected]", v: { value: 1, label: "test1" } }
{ e: "[email protected]", v: { value: 2, label: "test2" } }
{ e: "[email protected]", v: { value: 6, label: "test1" } }
{ e: "[email protected]", v: { value: 1, label: "test2" } }

Although the single letter attribute names reduce human-readability, it does cut down on the size of the data that is inflated by lengthy repeated attribute names.

like image 147
NoOutlet Avatar answered Oct 07 '22 20:10

NoOutlet