I'm using CouchDB. I'd like to be able to count occurrences of values of specific fields within a date range that can be specified at query time. I seem to be able to do parts of this, but I'm having trouble understanding the best way to pull it all together.
Assuming documents that have a timestamp field and another field, e.g.:
{ date: '20120101-1853', author: 'bart' }
{ date: '20120102-1850', author: 'homer'}
{ date: '20120103-2359', author: 'homer'}
{ date: '20120104-1200', author: 'lisa'}
{ date: '20120815-1250', author: 'lisa'}
I can easily create a view that filters documents by a flexible date range. This can be done with a view like the one below, called with key range parameters, e.g. _view/all-docs?startkey=20120101-0000&endkey=20120201-0000
.
all-docs/map.js:
function(doc) {
emit(doc.date, doc);
}
With the data above, this would return a CouchDB view containing just the first 4 docs (the only docs in the date range).
I can also create a query that counts occurrences of a given field, like this, called with grouping, i.e. _view/author-count?group=true
:
author-count/map.js:
function(doc) {
emit(doc.author, 1);
}
author-count/reduce.js:
function(keys, values, rereduce) {
return sum(values);
}
This would yield something like:
{
"rows": [
{"key":"bart","value":1},
{"key":"homer","value":2}
{"key":"lisa","value":2}
]
}
However, I can't find the best way to both filter by date and count occurrences. For example, with the data above, I'd like to be able to specify range parameters like startkey=20120101-0000&endkey=20120201-0000
and get a result like this, where the last doc is excluded from the count because it is outside the specified date range:
{
"rows": [
{"key":"bart","value":1},
{"key":"homer","value":2}
{"key":"lisa","value":1}
]
}
What's the most elegant way to do this? Is this achievable with a single query? Should I be using another CouchDB construct, or is a view sufficient for this?
To retrieve view results in reverse order, use the descending=true query parameter. If you are using a startkey parameter, you will find that CouchDB returns different rows or no rows at all. What's up with that? It's pretty easy to understand when you see how view query options work under the hood.
CouchDB uses views as the primary tool for running queries and creating reports from stored document files. Views allow you to filter documents to find information relevant to a particular database process.
Basically views are JavaScript codes which will be put in a document inside the database that they operate on. This special document is called Design document in CouchDB. Each Design document can implement multiple view. Please consult Official CouchDB Design Documents to learn more about how to write view.
You can get pretty close to the desired result with a list:
{
_id: "_design/authors",
views: {
authors_by_date: {
map: function(doc) {
emit(doc.date, doc.author);
}
}
},
lists: {
count_occurrences: function(head, req) {
start({ headers: { "Content-Type": "application/json" }});
var result = {};
var row;
while(row = getRow()) {
var val = row.value;
if(result[val]) result[val]++;
else result[val] = 1;
}
return result;
}
}
}
This design can be requested as such:
http://<couchurl>/<db>/_design/authors/_list/count_occurrences/authors_by_date?startkey=<startDate>&endkey=<endDate>
This will be slower than a normal map-reduce, and is a bit of a workaround. Unfortunately, this is the only way to do a multi-dimensional query, "which CouchDB isn’t suited for".
The result of requesting this design will be something like this:
{
"bart": 1,
"homer": 2,
"lisa": 2
}
What we do is basically emit a lot of elements, then using a list to group them as we want. A list can be used to display a result in any way you want, but will also often be slower. Whereas a normal map-reduce can be cached and only change according to the diffs, the list will have to be built anew every time it is requested.
It is pretty much as slow as getting all the elements resulting from the map (the overhead of orchestrating the data is mostly negligible): a lot slower than getting the result of a reduce.
If you want to use the list for a different view, you can simply exchange it in the URL you request:
http://<couchurl>/<db>/_design/authors/_list/count_occurrences/<view>
Read more about lists on the couchdb wiki.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With