Say I have 17,000 documents that have a structure similar to the document below:
{
someInfo: "blah blah blah",
// and another dozen or so attributes here, followed by:
answers:[
{
email: "[email protected],
values:[
{value: 1, label: "test1"},
{value: 2, label: "test2"}
]
},
{
email: "[email protected],
values:[
{value: 6, label: "test1"},
{value: 1, label: "test2"}
]
}
]
}
Say I use aggregate to unwind both answers
and answers.values
like so:
db.participants.aggregate(
{$unwind: "$answers"},
{$unwind: "$answers.values"}
);
I assume it would create a fairly large result set in memory since it would essentially be replicating the parent object 17,000 * # of answers * # of values
times.
I have been testing a query that does something similar on a development environment and the performance of the query itself is fine, but I'm wondering if I should be concerned about running this on a production environment where the unwound result set could potentially eat up a lot of memory. Mongo's documentation on $unwind goes into how it works, but does not discuss potential performance problems.
Should I be worried about doing this on a production system? Will it slow down other queries against the db?
It is always a good idea to be cognizant of memory resources when $unwind
ing because of the replication of data that occurs.
Using $match
to narrow down the results to the specific documents you are looking for is of course one way to reduce the amount of memory necessary to hold the returned data.
Another way to reduce the memory footprint is with $project
. $project
allows you to re-organize the documents in the pipeline so that you only return the elements in which you are interested.
To use your example,
{
someInfo: "blah blah blah",
answers: [
{
email: "[email protected]",
values: [
{value: 1, label: "test1"},
{value: 2, label: "test2"}
]
},
{
email: "[email protected]",
values: [
{value: 6, label: "test1"},
{value: 1, label: "test2"}
]
}
]
}
With
db.collection.aggregate([{ $match: { <element>: <value> }}, { $project: { _id: 0, answers: 1}}])
will remove the someInfo
and other attributes you may not be interested in. Then you could $project
again after unwinding...
db.collection.aggregate([
{ $match: { <element>: <value> }},
{ $project: { _id: 0, answers: 1}},
{ $unwind: "$answers"},
{ $unwind: "$answers.tags"},
{ $project: { e: "$answers.email", v: "$answers.values"}}
])
will return fairly compact results like:
{ e: "[email protected]", v: { value: 1, label: "test1" } }
{ e: "[email protected]", v: { value: 2, label: "test2" } }
{ e: "[email protected]", v: { value: 6, label: "test1" } }
{ e: "[email protected]", v: { value: 1, label: "test2" } }
Although the single letter attribute names reduce human-readability, it does cut down on the size of the data that is inflated by lengthy repeated attribute names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With