Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting by relevance with MongoDB

I have a collection of documents in the following form:

{ _id: ObjectId(...)
, title: "foo"
, tags: ["bar", "baz", "qux"] 
}

The query should find all documents with any of these tags. I currently use this query:

{ "tags": { "$in": ["bar", "hello"] } }

And it works; all documents tagged "bar" or "hello" are returned.

However, I want to sort by relevance, i.e. the more matching tags the earlier the document should occur in the result. For example, a document tagged ["bar", "hello", "baz"] should be higher in the results than a document tagged ["bar", "baz", "boo"] for the query ["bar", "hello"]. How can I achieve this?

like image 353
qox Avatar asked Oct 07 '12 16:10

qox


People also ask

Does MongoDB support sorting?

MongoDB can perform sort operations on a single-field index in ascending or descending order. In compound indexes, the sort order determines whether the index can be sorted. The sort keys must be listed in the same order as defined in the index.

What sorting algorithm does MongoDB use?

If MongoDB cannot obtain the sort order via an index scan, then MongoDB uses a top-k sort algorithm. This algorithm buffers the first k results (or last, depending on the sort order) seen so far by the underlying index or collection access.

How do I sort values in MongoDB?

To sort documents in MongoDB, you need to use sort() method. The method accepts a document containing a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.

How do I sort groups in MongoDB?

And for group sorting in MongoDB, the $group operator is used with the $sort operator. With the help of the $group and $sort operator, MongoDB can also sort the grouped data in ascending or descending order. In this post, we have provided an informative insight into the aggregate group sort functionality of MongoDB.


1 Answers

MapReduce and doing it client-side is going to be too slow - you should use the aggregation framework (new in MongoDB 2.2).

It might look something like this:

db.collection.aggregate([
   { $match : { "tags": { "$in": ["bar", "hello"] } } },
   { $unwind : "$tags" },
   { $match : { "tags": { "$in": ["bar", "hello"] } } },
   { $group : { _id: "$title", numRelTags: { $sum:1 } } },
   { $sort : { numRelTags : -1 } }
   //  optionally
   , { $limit : 10 }
])

Note the first and third pipeline members look identical, this is intentional and needed. Here is what the steps do:

  1. pass on only documents which have tag "bar" or "hello" in them.
  2. unwind the tags array (meaning split into one document per tags element
  3. pass on only tags exactly "bar" or "hello" (i.e. discard the rest of the tags)
  4. group by title (it could be also by "$_id" or any other combination of original document adding up how many tags (of "bar" and "hello") it had
  5. sort in descending order by number of relevant tags
  6. (optionally) limit the returned set to top 10.
like image 81
Asya Kamsky Avatar answered Sep 30 '22 13:09

Asya Kamsky