Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB: find unique documents between date range in a collection

Tags:

mongodb

I am not sure how to perform this task

Here is document structure

name:
date_created:
val:

I need to find out unique documents created between January 2011 and October 2011

I know that I can find out the number of document between two date range as

db.collection.find({'date_created': {'$gte': '2011-01-01', '$lt': '2011-10-30'}});  

and I can know the distinct as

db.runCommand({'distinct': 'collection', 'key': 'name'})   

Problem

The problem is that there are duplicate documents inside collection that I need to remove.

How can I answer this question?

find out unique documents created between January 2011 and October 2011 where uniqueness is based on 'name' key

UPDATE

@Sergio ansewer is perfect, after running the query, I got the following result and it can be seen that output number < input number which means duplicates were removed

{
    "result" : "temp_collection",
    "timeMillis" : 1509717,
    "counts" : {
        "input" : 592364,
        "emit" : 592364,
        "output" : 380827
    },
    "ok" : 1
}
like image 892
daydreamer Avatar asked Feb 19 '23 21:02

daydreamer


1 Answers

Seems that it can be solved with map-reduce. Something like this should help.

var map = function() {
  emit(this.name, this);
}

var reduce = function(key, vals) {
  // vals contains all documents for this key (name). Just pick one.
  return vals[0];
}

db.runCommand({
  mapreduce: 'collection',
  map: map,
  reduce: reduce,
  query: {'date_created': {'$gte': '2011-01-01', '$lt': '2011-10-30'}},
  out: 'temp_collection'
});

After this command returns, you should have your unique documents in temp_collection.

like image 102
Sergio Tulentsev Avatar answered Feb 22 '23 10:02

Sergio Tulentsev