Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting CouchDB Views By Value

I'm testing out CouchDB to see how it could handle logging some search results. What I'd like to do is produce a view where I can produce the top queries from the results. At the moment I have something like this:

Example document portion

{   "query": "+dangerous +dogs",   "hits": "123" } 

Map function (Not exactly what I need/want but it's good enough for testing)

function(doc) {   if (doc.query) {     var split = doc.query.split(" ");     for (var i in split) {       emit(split[i], 1);     }   } } 

Reduce Function

function (key, values, rereduce) {   return sum(values); } 

Now this will get me results in a format where a query term is the key and the count for that term on the right, which is great. But I'd like it ordered by the value, not the key. From the sounds of it, this is not yet possible with CouchDB.

So does anyone have any ideas of how I can get a view where I have an ordered version of the query terms & their related counts? I'm very new to CouchDB and I just can't think of how I'd write the functions needed.

like image 286
Lee Theobald Avatar asked May 12 '10 10:05

Lee Theobald


People also ask

What are views in CouchDB?

Basically views are JavaScript codes which will be put in a document inside the database that they operate on. This special document is called Design document in CouchDB. Each Design document can implement multiple view. Please consult Official CouchDB Design Documents to learn more about how to write view.

Where does the permanent views are stored in CouchDB?

A permanent view is stored as a special kind of document between the other regular documents. These special kind of documents are called 'design documents'. A permanent view can be executed by performing a GET operation with the name of the view. A temporary view, as its name implies, is not stored by CouchDB.


2 Answers

It is true that there is no dead-simple answer. There are several patterns however.

  1. http://wiki.apache.org/couchdb/View_Snippets#Retrieve_the_top_N_tags. I do not personally like this because they acknowledge that it is a brittle solution, and the code is not relaxing-looking.

  2. Avi's answer, which is to sort in-memory in your application.

  3. couchdb-lucene which it seems everybody finds themselves needing eventually!

  4. What I like is what Chris said in Avi's quote. Relax. In CouchDB, databases are lightweight and excel at giving you a unique perspective of your data. These days, the buzz is all about filtered replication which is all about slicing out subsets of your data to put in a separate DB.

    Anyway, the basics are simple. You take your .rows from the view output and you insert it into a separate DB which simply emits keyed on the count. An additional trick is to write a very simple _list function. Lists "render" the raw couch output into different formats. Your _list function should output

    { "docs":     [ {..view row1...},       {..view row2...},       {..etc...}     ] } 

    What that will do is format the view output exactly the way the _bulk_docs API requires it. Now you can pipe curl directly into another curl:

    curl host:5984/db/_design/myapp/_list/bulkdocs_formatter/query_popularity \  | curl -X POST host:5984/popularity_sorter/_design/myapp/_view/by_count 
  5. In fact, if your list function can handle all the docs, you may just have it sort them itself and return them to the client sorted.

like image 67
JasonSmith Avatar answered Sep 23 '22 05:09

JasonSmith


This came up on the CouchDB-user mailing list, and Chris Anderson, one of the primary developers, wrote:

This is a common request, but not supported directly by CouchDB's views -- to do this you'll need to copy the group-reduce query to another database, and build a view to sort by value.

This is a tradeoff we make in favor of dynamic range queries and incremental indexes.

I needed to do this recently as well, and I ended up doing it in my app tier. This is easy to do in JavaScript:

db.view('mydesigndoc', 'myview', {'group':true}, function(err, data) {      if (err) throw new Error(JSON.stringify(err));      data.rows.sort(function(a, b) {         return a.value - b.value;     });      data.rows.reverse(); // optional, depending on your needs      // do something with the data… }); 

This example runs in Node.js and uses node-couchdb, but it could easily be adapted to run in a browser or another JavaScript environment. And of course the concept is portable to any programming language/environment.

HTH!

like image 25
Avi Flax Avatar answered Sep 24 '22 05:09

Avi Flax