Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a CouchDB view, can I count groups and filter by key range at the same time?

I'm using CouchDB. I'd like to be able to count occurrences of values of specific fields within a date range that can be specified at query time. I seem to be able to do parts of this, but I'm having trouble understanding the best way to pull it all together.

Assuming documents that have a timestamp field and another field, e.g.:

{ date: '20120101-1853', author: 'bart' }
{ date: '20120102-1850', author: 'homer'}
{ date: '20120103-2359', author: 'homer'}
{ date: '20120104-1200', author: 'lisa'}
{ date: '20120815-1250', author: 'lisa'}

I can easily create a view that filters documents by a flexible date range. This can be done with a view like the one below, called with key range parameters, e.g. _view/all-docs?startkey=20120101-0000&endkey=20120201-0000.

all-docs/map.js:

function(doc) {
    emit(doc.date, doc);
}

With the data above, this would return a CouchDB view containing just the first 4 docs (the only docs in the date range).

I can also create a query that counts occurrences of a given field, like this, called with grouping, i.e. _view/author-count?group=true:

author-count/map.js:

function(doc) {
  emit(doc.author, 1);
}

author-count/reduce.js:

function(keys, values, rereduce) {
  return sum(values);
}

This would yield something like:

{
    "rows": [
        {"key":"bart","value":1},
        {"key":"homer","value":2}
        {"key":"lisa","value":2}
     ]
}

However, I can't find the best way to both filter by date and count occurrences. For example, with the data above, I'd like to be able to specify range parameters like startkey=20120101-0000&endkey=20120201-0000 and get a result like this, where the last doc is excluded from the count because it is outside the specified date range:

{
    "rows": [
        {"key":"bart","value":1},
        {"key":"homer","value":2}
        {"key":"lisa","value":1}
     ]
}

What's the most elegant way to do this? Is this achievable with a single query? Should I be using another CouchDB construct, or is a view sufficient for this?

like image 222
rewbs Avatar asked Oct 17 '12 22:10

rewbs


People also ask

How do you query a view in CouchDB?

To retrieve view results in reverse order, use the descending=true query parameter. If you are using a startkey parameter, you will find that CouchDB returns different rows or no rows at all. What's up with that? It's pretty easy to understand when you see how view query options work under the hood.

What are the primary benefits of implementing views in CouchDB?

CouchDB uses views as the primary tool for running queries and creating reports from stored document files. Views allow you to filter documents to find information relevant to a particular database process.

What are views in CouchDB?

Basically views are JavaScript codes which will be put in a document inside the database that they operate on. This special document is called Design document in CouchDB. Each Design document can implement multiple view. Please consult Official CouchDB Design Documents to learn more about how to write view.


1 Answers

You can get pretty close to the desired result with a list:

{
  _id: "_design/authors",
  views: {
    authors_by_date: {
      map: function(doc) {
        emit(doc.date, doc.author);
      }
    }
  },
  lists: {
    count_occurrences: function(head, req) {
      start({ headers: { "Content-Type": "application/json" }});

      var result = {};
      var row;
      while(row = getRow()) {
        var val = row.value;
        if(result[val]) result[val]++;
        else result[val] = 1;
      }
      return result;
    }
  }
}

This design can be requested as such:

http://<couchurl>/<db>/_design/authors/_list/count_occurrences/authors_by_date?startkey=<startDate>&endkey=<endDate>

This will be slower than a normal map-reduce, and is a bit of a workaround. Unfortunately, this is the only way to do a multi-dimensional query, "which CouchDB isn’t suited for".

The result of requesting this design will be something like this:

{
  "bart": 1,
  "homer": 2,
  "lisa": 2
}

What we do is basically emit a lot of elements, then using a list to group them as we want. A list can be used to display a result in any way you want, but will also often be slower. Whereas a normal map-reduce can be cached and only change according to the diffs, the list will have to be built anew every time it is requested.

It is pretty much as slow as getting all the elements resulting from the map (the overhead of orchestrating the data is mostly negligible): a lot slower than getting the result of a reduce.

If you want to use the list for a different view, you can simply exchange it in the URL you request:

http://<couchurl>/<db>/_design/authors/_list/count_occurrences/<view>

Read more about lists on the couchdb wiki.

like image 125
Niels Abildgaard Avatar answered Nov 11 '22 06:11

Niels Abildgaard