Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongodb get distinct records

I am using mongoDB in which I have collection of following format.

{"id" : 1 , name : x  ttm : 23 , val : 5 } {"id" : 1 , name : x  ttm : 34 , val : 1 } {"id" : 1 , name : x  ttm : 24 , val : 2 } {"id" : 2 , name : x  ttm : 56 , val : 3 } {"id" : 2 , name : x  ttm : 76 , val : 3 } {"id" : 3 , name : x  ttm : 54 , val : 7 } 

On that collection I have queried to get records in descending order like this:

db.foo.find({"id" : {"$in" : [1,2,3]}}).sort(ttm : -1).limit(3) 

But it gives two records of same id = 1 and I want records such that it gives 1 record per id.

Is it possible in mongodb?

like image 718
Swapnil Sonawane Avatar asked Feb 23 '11 09:02

Swapnil Sonawane


People also ask

What does distinct do in MongoDB?

In MongoDB, the distinct() method finds the distinct values for a given field across a single collection and returns the results in an array. It takes three parameters first one is the field for which to return distinct values and the others are optional.

What does find () do in MongoDB?

Find() Method. In MongoDB, find() method is used to select documents in a collection and return a cursor to the selected documents.

How do I make a field unique in MongoDB?

To create a unique index, use the db. collection. createIndex() method with the unique option set to true .


2 Answers

There is a distinct command in mongodb, that can be used in conjunction with a query. However, I believe this just returns a distinct list of values for a specific key you name (i.e. in your case, you'd only get the id values returned) so I'm not sure this will give you exactly what you want if you need the whole documents - you may require MapReduce instead.

Documentation on distinct: http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

like image 89
AdaTheDev Avatar answered Sep 22 '22 00:09

AdaTheDev


You want to use aggregation. You could do that like this:

db.test.aggregate([     // each Object is an aggregation.     {         $group: {             originalId: {$first: '$_id'}, // Hold onto original ID.             _id: '$id', // Set the unique identifier             val:  {$first: '$val'},             name: {$first: '$name'},             ttm:  {$first: '$ttm'}         }      }, {         // this receives the output from the first aggregation.         // So the (originally) non-unique 'id' field is now         // present as the _id field. We want to rename it.         $project:{             _id : '$originalId', // Restore original ID.              id  : '$_id', //              val : '$val',             name: '$name',             ttm : '$ttm'         }     } ]) 

This will be very fast... ~90ms for my test DB of 100,000 documents.

Example:

db.test.find() // { "_id" : ObjectId("55fb595b241fee91ac4cd881"), "id" : 1, "name" : "x", "ttm" : 23, "val" : 5 } // { "_id" : ObjectId("55fb596d241fee91ac4cd882"), "id" : 1, "name" : "x", "ttm" : 34, "val" : 1 } // { "_id" : ObjectId("55fb59c8241fee91ac4cd883"), "id" : 1, "name" : "x", "ttm" : 24, "val" : 2 } // { "_id" : ObjectId("55fb59d9241fee91ac4cd884"), "id" : 2, "name" : "x", "ttm" : 56, "val" : 3 } // { "_id" : ObjectId("55fb59e7241fee91ac4cd885"), "id" : 2, "name" : "x", "ttm" : 76, "val" : 3 } // { "_id" : ObjectId("55fb59f9241fee91ac4cd886"), "id" : 3, "name" : "x", "ttm" : 54, "val" : 7 }   db.test.aggregate(/* from first code snippet */)  // output {     "result" : [         {             "_id" : ObjectId("55fb59f9241fee91ac4cd886"),             "val" : 7,             "name" : "x",             "ttm" : 54,             "id" : 3         },         {             "_id" : ObjectId("55fb59d9241fee91ac4cd884"),             "val" : 3,             "name" : "x",             "ttm" : 56,             "id" : 2         },         {             "_id" : ObjectId("55fb595b241fee91ac4cd881"),             "val" : 5,             "name" : "x",             "ttm" : 23,             "id" : 1         }     ],     "ok" : 1 } 

PROS: Almost certainly the fastest method.

CONS: Involves use of the complicated Aggregation API. Also, it is tightly coupled to the original schema of the document. Though, it may be possible to generalize this.

like image 26
robert Avatar answered Sep 22 '22 00:09

robert