This question - Is it possible to get a slice of a slice in Mongo? covers how to get a slice of a slice in Mongo. In short, use an aggregation chain to $unwind
, $skip
, $limit
, $unwind
, $skip
, $limit
, and $group
.
My question is how to do this on a collection of multiple documents. I want to trim the nested array within each one. Once I $unwind
, though, $skip
and $limit
only make sense according to the numbering of the first array.
Is there a way to run this kind of pipeline on each document in a collection, rather than on the collection as a whole? Is what I'm looking to do possible in the aggregation pipeline at all? It's clearly possible using Map-Reduce, but it's slower to do that than to run the n+1 queries to $unwind
each document individually.
Edit
Below is an example record.
{
title: "Text Title"
vtitle: "Text Version Title"
text: [[["Book1, Chapter 1, Line 1", "Book1, Chapter 1, Line 2"],["Book 1, Chapter 2, Line 1"]],[["Book 2, Chapter 1, Line 1]]]
}
The record here is the text of a large book, stored as a depth 3 array. There can be many different vtitle
for the same title
, and the text
can be quite large.
I'd like to select a small bit of contained text, identified by indexes, from each book in a collection of many books - a slice of a slice of every single document returned.
For example, input parameters of [3,3]
would return records like:
{ "text" : ["Book 4, Chapter 4, Line 1", "Book 4, Chapter 4, Line 2", ...] }
To unwind, use $unwind. The $unwind deconstructs an array field from the input documents to output a document for each element.
Document Operand with Options You can pass a document to $unwind to specify various behavior options.
What is MongoDB $unwind? The MongoDB $unwind operator is used to deconstruct an array field in a document and create separate output documents for each item in the array.
TL; DR
I think the short answer is that you can't really do what you want yet. Current options would be wait until v3.1 or hack it with a group
aggregation (but I suspect that this is too slow for your needs).
Justification
Although it's not quite clear the exact result you want to get, the intention is clearly that you want to be able to find a set of matching documents in your collection and transform (i.e. map) the documents (by slicing your nested array to produce a flat list of strings). The search is irrelevant as you can either do it before or after the mapping and still satisfy your constraints. I will therefore just talk about the mapping.
This is a natural use case for MapReduce, but you have explicitly excluded that from the allowed answers. So, there are 3 options, that I will take in turn.
1) Queries
Since you have disallowed multiple queries, your only option is to map the data within the request. This is handled through the projection operators. These will not work.
$slice
operator here, it does not handle nested arrays. $
operator only allows you to take the first entry in an array, which also isn't sufficient to get an arbitrary location in your array.$elemMatch
only allows you to get one field from an array - which is also insufficient for your needs.In addition you can't chain projections on a query, so you can't put multiple projections together in some cunning way to slice the data multiple times.
In short, this will not work.
2) Aggregation pipeline
Unfortunately, there isn't a slice operator for the aggregation pipeline until v3.1. You are therefore limited to $project or maybe some cunning use of the other operators (as per the article you linked).
Taking the projection operator first. While you can operate on array fields, you are limited to just getting the size for now. You could try to use set logic but that is inherently unordered, so you cannot get the Nth entry here.
A direct operation therefore clearly does not work. So can we enhance the article you linked? This workaround only works for one document because you don't need to differentiate between multiple documents. You can therefore afford to unwind an array to create a bigger list of documents and then use document level scope operations to effectively do your slicing.
Sadly, this falls down when you need to find the start of the next original documentation in your latest unwound list. There is no combination of operators that allows you to enumerate your unwound array and then select on that enumeration and the the original document.
$unwind
expands the array, but does not give you an index for you to subsequently match and $skip provides no way to skip to the next document with a matching condition.$redact
still keeps you in the scope of your original documents, but then suffers the same issue as $project
that it can't operate on the nested array.In short, this is a bust too.
3) Group aggregation
I was about to give up at this point, then I noticed group aggregation. You can create a filter for the matching documents, but then provide an arbitrary JavaScript function with finalize
to transform that data before returning it. That means you should be able to issue a command like this:
db.runCommand(
{
group:
{
ns: 'books',
key: { title: 1, text: 1 },
cond: { },
$reduce: function (curr, result) { },
initial: { },
finalize: function(result) {
// Insert your code here to slice the array - e.g.
result.text = result.text[0][0]
}
}
})
Of course, as documented here this does not work if your database is sharded, your results are larger than 16MB or you have more than 20,000 documents (as each one is now a key for the aggregation). It's also very slow when the data set gets large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With