Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using $unwind on multiple documents

This question - Is it possible to get a slice of a slice in Mongo? covers how to get a slice of a slice in Mongo. In short, use an aggregation chain to $unwind, $skip, $limit, $unwind, $skip, $limit, and $group.

My question is how to do this on a collection of multiple documents. I want to trim the nested array within each one. Once I $unwind, though, $skip and $limit only make sense according to the numbering of the first array.

Is there a way to run this kind of pipeline on each document in a collection, rather than on the collection as a whole? Is what I'm looking to do possible in the aggregation pipeline at all? It's clearly possible using Map-Reduce, but it's slower to do that than to run the n+1 queries to $unwind each document individually.


Edit

Below is an example record.

{
  title: "Text Title"
  vtitle: "Text Version Title"
  text:  [[["Book1, Chapter 1, Line 1", "Book1, Chapter 1, Line 2"],["Book 1, Chapter 2, Line 1"]],[["Book 2, Chapter 1, Line 1]]]
}

The record here is the text of a large book, stored as a depth 3 array. There can be many different vtitle for the same title, and the text can be quite large.

I'd like to select a small bit of contained text, identified by indexes, from each book in a collection of many books - a slice of a slice of every single document returned.

For example, input parameters of [3,3] would return records like:

{ "text" : ["Book 4, Chapter 4, Line 1", "Book 4, Chapter 4, Line 2", ...] }
like image 366
Laizer Avatar asked Jul 08 '15 17:07

Laizer


People also ask

How do you unwind more than one array?

To unwind, use $unwind. The $unwind deconstructs an array field from the input documents to output a document for each element.

What document is passed to $unwind to specify various Behaviour operations?

Document Operand with Options You can pass a document to $unwind to specify various behavior options.

What does $unwind do in MongoDB?

What is MongoDB $unwind? The MongoDB $unwind operator is used to deconstruct an array field in a document and create separate output documents for each item in the array.


1 Answers

TL; DR

I think the short answer is that you can't really do what you want yet. Current options would be wait until v3.1 or hack it with a group aggregation (but I suspect that this is too slow for your needs).

Justification

Although it's not quite clear the exact result you want to get, the intention is clearly that you want to be able to find a set of matching documents in your collection and transform (i.e. map) the documents (by slicing your nested array to produce a flat list of strings). The search is irrelevant as you can either do it before or after the mapping and still satisfy your constraints. I will therefore just talk about the mapping.

This is a natural use case for MapReduce, but you have explicitly excluded that from the allowed answers. So, there are 3 options, that I will take in turn.

1) Queries

Since you have disallowed multiple queries, your only option is to map the data within the request. This is handled through the projection operators. These will not work.

  1. While there is a $slice operator here, it does not handle nested arrays.
  2. The $ operator only allows you to take the first entry in an array, which also isn't sufficient to get an arbitrary location in your array.
  3. The $elemMatch only allows you to get one field from an array - which is also insufficient for your needs.

In addition you can't chain projections on a query, so you can't put multiple projections together in some cunning way to slice the data multiple times.

In short, this will not work.

2) Aggregation pipeline

Unfortunately, there isn't a slice operator for the aggregation pipeline until v3.1. You are therefore limited to $project or maybe some cunning use of the other operators (as per the article you linked).

Taking the projection operator first. While you can operate on array fields, you are limited to just getting the size for now. You could try to use set logic but that is inherently unordered, so you cannot get the Nth entry here.

A direct operation therefore clearly does not work. So can we enhance the article you linked? This workaround only works for one document because you don't need to differentiate between multiple documents. You can therefore afford to unwind an array to create a bigger list of documents and then use document level scope operations to effectively do your slicing.

Sadly, this falls down when you need to find the start of the next original documentation in your latest unwound list. There is no combination of operators that allows you to enumerate your unwound array and then select on that enumeration and the the original document.

  1. $unwind expands the array, but does not give you an index for you to subsequently match and $skip provides no way to skip to the next document with a matching condition.
  2. $redact still keeps you in the scope of your original documents, but then suffers the same issue as $project that it can't operate on the nested array.

In short, this is a bust too.

3) Group aggregation

I was about to give up at this point, then I noticed group aggregation. You can create a filter for the matching documents, but then provide an arbitrary JavaScript function with finalize to transform that data before returning it. That means you should be able to issue a command like this:

db.runCommand(
   {
     group:
       {
         ns: 'books',
         key: { title: 1, text: 1 },
         cond: { },
         $reduce: function (curr, result) { },
         initial: { },
         finalize: function(result) {
             // Insert your code here to slice the array - e.g.
             result.text = result.text[0][0]
         }
      }
   })

Of course, as documented here this does not work if your database is sharded, your results are larger than 16MB or you have more than 20,000 documents (as each one is now a key for the aggregation). It's also very slow when the data set gets large.

like image 61
Peter Brittain Avatar answered Oct 22 '22 19:10

Peter Brittain