I am not sure how to perform this task
Here is document structure
name:
date_created:
val:
I need to find out unique documents created between January 2011 and October 2011
I know that I can find out the number of document between two date range as
db.collection.find({'date_created': {'$gte': '2011-01-01', '$lt': '2011-10-30'}});
and I can know the distinct as
db.runCommand({'distinct': 'collection', 'key': 'name'})
Problem
The problem is that there are duplicate documents inside collection that I need to remove.
How can I answer this question?
find out unique documents created between January 2011 and October 2011 where uniqueness is based on 'name' key
UPDATE
@Sergio ansewer is perfect, after running the query, I got the following result and it can be seen that output number < input number
which means duplicates were removed
{
"result" : "temp_collection",
"timeMillis" : 1509717,
"counts" : {
"input" : 592364,
"emit" : 592364,
"output" : 380827
},
"ok" : 1
}
Seems that it can be solved with map-reduce. Something like this should help.
var map = function() {
emit(this.name, this);
}
var reduce = function(key, vals) {
// vals contains all documents for this key (name). Just pick one.
return vals[0];
}
db.runCommand({
mapreduce: 'collection',
map: map,
reduce: reduce,
query: {'date_created': {'$gte': '2011-01-01', '$lt': '2011-10-30'}},
out: 'temp_collection'
});
After this command returns, you should have your unique documents in temp_collection
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With