Using $unwind on multiple documents

Tags:

This question - Is it possible to get a slice of a slice in Mongo? covers how to get a slice of a slice in Mongo. In short, use an aggregation chain to $unwind, $skip, $limit, $unwind, $skip, $limit, and $group.

My question is how to do this on a collection of multiple documents. I want to trim the nested array within each one. Once I $unwind, though, $skip and $limit only make sense according to the numbering of the first array.

Is there a way to run this kind of pipeline on each document in a collection, rather than on the collection as a whole? Is what I'm looking to do possible in the aggregation pipeline at all? It's clearly possible using Map-Reduce, but it's slower to do that than to run the n+1 queries to $unwind each document individually.

Edit

Below is an example record.

{
  title: "Text Title"
  vtitle: "Text Version Title"
  text:  [[["Book1, Chapter 1, Line 1", "Book1, Chapter 1, Line 2"],["Book 1, Chapter 2, Line 1"]],[["Book 2, Chapter 1, Line 1]]]
}

The record here is the text of a large book, stored as a depth 3 array. There can be many different vtitle for the same title, and the text can be quite large.

I'd like to select a small bit of contained text, identified by indexes, from each book in a collection of many books - a slice of a slice of every single document returned.

For example, input parameters of [3,3] would return records like:

{ "text" : ["Book 4, Chapter 4, Line 1", "Book 4, Chapter 4, Line 2", ...] }

366

asked Jul 08 '15 17:07

Laizer

1 Answers

TL; DR

I think the short answer is that you can't really do what you want yet. Current options would be wait until v3.1 or hack it with a group aggregation (but I suspect that this is too slow for your needs).

Justification

Although it's not quite clear the exact result you want to get, the intention is clearly that you want to be able to find a set of matching documents in your collection and transform (i.e. map) the documents (by slicing your nested array to produce a flat list of strings). The search is irrelevant as you can either do it before or after the mapping and still satisfy your constraints. I will therefore just talk about the mapping.

This is a natural use case for MapReduce, but you have explicitly excluded that from the allowed answers. So, there are 3 options, that I will take in turn.

1) Queries

Since you have disallowed multiple queries, your only option is to map the data within the request. This is handled through the projection operators. These will not work.

While there is a $slice operator here, it does not handle nested arrays.
The $ operator only allows you to take the first entry in an array, which also isn't sufficient to get an arbitrary location in your array.
The $elemMatch only allows you to get one field from an array - which is also insufficient for your needs.

In addition you can't chain projections on a query, so you can't put multiple projections together in some cunning way to slice the data multiple times.

In short, this will not work.

2) Aggregation pipeline

Unfortunately, there isn't a slice operator for the aggregation pipeline until v3.1. You are therefore limited to $project or maybe some cunning use of the other operators (as per the article you linked).

Taking the projection operator first. While you can operate on array fields, you are limited to just getting the size for now. You could try to use set logic but that is inherently unordered, so you cannot get the Nth entry here.

A direct operation therefore clearly does not work. So can we enhance the article you linked? This workaround only works for one document because you don't need to differentiate between multiple documents. You can therefore afford to unwind an array to create a bigger list of documents and then use document level scope operations to effectively do your slicing.

Sadly, this falls down when you need to find the start of the next original documentation in your latest unwound list. There is no combination of operators that allows you to enumerate your unwound array and then select on that enumeration and the the original document.

$unwind expands the array, but does not give you an index for you to subsequently match and $skip provides no way to skip to the next document with a matching condition.
$redact still keeps you in the scope of your original documents, but then suffers the same issue as $project that it can't operate on the nested array.

In short, this is a bust too.

3) Group aggregation

I was about to give up at this point, then I noticed group aggregation. You can create a filter for the matching documents, but then provide an arbitrary JavaScript function with finalize to transform that data before returning it. That means you should be able to issue a command like this:

db.runCommand(
   {
     group:
       {
         ns: 'books',
         key: { title: 1, text: 1 },
         cond: { },
         $reduce: function (curr, result) { },
         initial: { },
         finalize: function(result) {
             // Insert your code here to slice the array - e.g.
             result.text = result.text[0][0]
         }
      }
   })

Of course, as documented here this does not work if your database is sharded, your results are larger than 16MB or you have more than 20,000 documents (as each one is now a key for the aggregation). It's also very slow when the data set gets large.

answered Oct 22 '22 19:10

Peter Brittain

Related questions
                            
                                mongorestore file X does not have .bson extension
                            
                                Issue persisting nested nested embedded documents
                            
                                Mapping Mongodb ObjectId to and from string automatically
                            
                                Mongoose.js: How to Implement Tree Structure via Population
                            
                                I'm trying to serialize embedded mongodb documents with JMSSerizial Bundle
                            
                                Understanding Mongo Timeouts on Save through Lock Percentage
                            
                                Mongo error: DBClientBase::findN: transport error()
                            
                                why a collection exists but can not find in show collections?
                            
                                mongodb node.js client, connect hangs
                            
                                NodeJS, Express, & Mongoose: Calls are intermittantly slow
                            
                                MongoDB Query Performance: Return all vs select fields
                            
                                What's best practice "joining" a bunch of values in mongoose/mongodb without populate
                            
                                Square Root in MongoDB Aggregate Pipeline
                            
                                Mongodb query by id using $in operator for large sets of Ids is too slow, alternatives?
                            
                                MongoDB: Slow query, even with index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using $unwind on multiple documents

Tags:

mongodb

aggregation-framework

Laizer

People also ask

1 Answers

Peter Brittain

Recent Activity

Donate For Us