Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB lists - get every Nth item

I have a Mongodb schema that looks roughly like:

[
  {
    "name" : "name1",
    "instances" : [ 
      {
        "value" : 1,
        "date" : ISODate("2015-03-04T00:00:00.000Z")            
      }, 
      {
        "value" : 2,
        "date" : ISODate("2015-04-01T00:00:00.000Z")
      }, 
      {
        "value" : 2.5,
        "date" : ISODate("2015-03-05T00:00:00.000Z")
      },
      ...
    ]
  },
  {
    "name" : "name2",
    "instances" : [ 
      ...
    ]
  }
]

where the number of instances for each element can be quite big.

I sometimes want to get only a sample of the data, that is, get every 3rd instance, or every 10th instance... you get the picture.

I can achieve this goal by getting all instances and filtering them in my server code, but I was wondering if there's a way to do it by using some aggregation query.

Any ideas?


Updated

Assuming the data structure was flat as @SylvainLeroux suggested below, that is:

[
  {"name": "name1", "value": 1, "date": ISODate("2015-03-04T00:00:00.000Z")},
  {"name": "name2", "value": 5, "date": ISODate("2015-04-04T00:00:00.000Z")},
  {"name": "name1", "value": 2, "date": ISODate("2015-04-01T00:00:00.000Z")},
  {"name": "name1", "value": 2.5, "date": ISODate("2015-03-05T00:00:00.000Z")},
  ...
]

will the task of getting every Nth item (of specific name) be easier?

like image 385
Yaron Schwimmer Avatar asked Jul 05 '15 15:07

Yaron Schwimmer


2 Answers

It seems that your question clearly asked "get every nth instance" which does seem like a pretty clear question.

Query operations like .find() can really only return the document "as is" with the exception of general field "selection" in projection and operators such as the positional $ match operator or $elemMatch that allow a singular matched array element.

Of course there is $slice, but that just allows a "range selection" on the array, so again does not apply.

The "only" things that can modify a result on the server are .aggregate() and .mapReduce(). The former does not "play very well" with "slicing" arrays in any way, at least not by "n" elements. However since the "function()" arguments of mapReduce are JavaScript based logic, then you have a little more room to play with.

For analytical processes, and for analytical purposes "only" then just filter the array contents via mapReduce using .filter():

db.collection.mapReduce(
    function() {
        var id = this._id;
        delete this._id;

        // filter the content of "instances" to every 3rd item only
        this.instances = this.instances.filter(function(el,idx) {
            return ((idx+1) % 3) == 0;
        });
        emit(id,this);
    },
    function() {},
    { "out": { "inline": 1 } } // or output to collection as required
)

It's really just a "JavaScript runner" at this point, but if this is just for anaylsis/testing then there is nothing generally wrong with the concept. Of course the output is not "exactly" how your document is structured, but it's as near a facsimile as mapReduce can get.

The other suggestion I see here requires creating a new collection with all the items "denormalized" and inserting the "index" from the array as part of the unqique _id key. That may produce something you can query directly, bu for the "every nth item" you would still have to do:

db.resultCollection.find({
     "_id.index": { "$in": [2,5,8,11,14] } // and so on ....
})

So work out and provide the index value of "every nth item" in order to get "every nth item". So that doesn't really seem to solve the problem that was asked.

If the output form seemed more desirable for your "testing" purposes, then a better subsequent query on those results would be using the aggregation pipeline, with $redact

db.newCollection([
    { "$redact": {
        "$cond": {
            "if": {
                "$eq": [ 
                    { "$mod": [ { "$add": [ "$_id.index", 1] }, 3 ] },
                0 ]
            },
            "then": "$$KEEP",
            "else": "$$PRUNE"
        }
    }}
])

That at least uses a "logical condition" much the same as what was applied with .filter() before to just select the "nth index" items without listing all possible index values as a query argument.

like image 132
Blakes Seven Avatar answered Oct 20 '22 17:10

Blakes Seven


No $unwind is needed here. You can use $push with $arrayElemAt to project the array value at requested index inside $group aggregation.

Something like

db.colname.aggregate(
[
  {"$group":{
    "_id":null,
    "valuesatNthindex":{"$push":{"$arrayElemAt":["$instances",N]}
    }}
  },
  {"$project":{"valuesatNthindex":1}}
])
like image 35
s7vr Avatar answered Oct 20 '22 18:10

s7vr