Consider following sample documents stored in CouchDB
{
"_id":....,
"rev":....,
"type":"orders",
"Period":"2013-01",
"Region":"East",
"Category":"Stationary",
"Product":"Pen",
"Rate":1,
"Qty":10,
"Amount":10
}
{
"_id":....,
"rev":....,
"type":"orders",
"Period":"2013-02",
"Region":"South",
"Category":"Food",
"Product":"Biscuit",
"Rate":7,
"Qty":5,
"Amount":35
}
Consider following SQL query
SELECT Period, Region,Category, Product, Min(Rate),Max(Rate),Count(Rate), Sum(Qty),Sum(Amount)
FROM Sales
GROUP BY Period,Region,Category, Product;
Is it possible to create map/reduce views in couchdb equivalent to the above SQL query and to produce output like
[
{
"Period":"2013-01",
"Region":"East",
"Category":"Stationary",
"Product":"Pen",
"MinRate":1,
"MaxRate":2,
"OrdersCount":20,
"TotQty":1000,
"Amount":1750
},
{
...
}
]
Up front, I believe @benedolph's answer is best-practice and best-case-scenario. Each reduce should ideally return 1 scalar value to keep the code as simple as possible.
However, it is true you'd have to issue multiple queries to retrieve the full resultset described by your question. If you don't have the option to run queries in parallel, or it is really important to keep the number of queries down it is possible to do it all at once.
Your map function will remain pretty simple:
function (doc) {
emit([ doc.Period, doc.Region, doc.Category, doc.Product ], doc);
}
The reduce function is where it gets lengthy:
function (key, values, rereduce) {
// helper function to sum all the values of a specified field in an array of objects
function sumField(arr, field) {
return arr.reduce(function (prev, cur) {
return prev + cur[field];
}, 0);
}
// helper function to create an array of just a single property from an array of objects
// (this function came from underscore.js, at least it's name and concept)
function pluck(arr, field) {
return arr.map(function (item) {
return item[field];
});
}
// rereduce made this more challenging, and I could not thoroughly test this right now
// see the CouchDB wiki for more information
if (rereduce) {
// a rereduce handles transitionary values
// (so the "values" below are the results of previous reduce functions, not the map function)
return {
OrdersCount: sumField(values, "OrdersCount"),
MinRate: Math.min.apply(Math, pluck(values, "MinRate")),
MaxRate: Math.max.apply(Math, pluck(values, "MaxRate")),
TotQty: sumField(values, "TotQty"),
Amount: sumField(values, "Amount")
};
} else {
var rates = pluck(values, "Rate");
// This takes a group of documents and gives you the stats you were asking for
return {
OrdersCount: values.length,
MinRate: Math.min.apply(Math, rates),
MaxRate: Math.max.apply(Math, rates),
TotQty: sumField(values, "Qty"),
Amount: sumField(values, "Amount")
};
}
}
I was not able to test the "rereduce" branch of this code at all, you'll have to do that on your end. (but this should work) See the wiki for information about reduce vs rereduce.
The helper functions I added at the top actually made the code overall much shorter and easier to read, they're largely influenced by my experience with Underscore.js. However, you can't include CommonJS modules in reduce functions, so it has to be written manually.
Again, best-case scenario is to have each aggregated field get it's own map/reduce index, but if that isn't on option to you, the above code should get you what you've described here in the question.
I will propose a very simple solution that requires one view per variable you want to aggregate in your "select" clause. While it is certainly possible to aggregate all variables in a single view, the reduce function would be far more complex.
The design document looks like this:
{
"_id": "_design/ddoc",
"_rev": "...",
"language": "javascript",
"views": {
"rates": {
"map": "function(doc) {\n emit([doc.Period, doc.Region, doc.Category, doc.Product], doc.Rate);\n}",
"reduce": "_stats"
},
"qty": {
"map": "function(doc) {\n emit([doc.Period, doc.Region, doc.Category, doc.Product], doc.Qty);\n}",
"reduce": "_stats"
}
}
}
Now, you can query <couchdb>/<database>/_design/ddoc/_view/rates?group_level=4
to get the statistics about the "Rate" variable. The result should look like this:
{"rows":[
{"key":["2013-01","East","Stationary","Pen"],"value":{"sum":4,"count":3,"min":1,"max":2,"sumsqr":6}},
{"key":["2013-01","North","Stationary","Pen"],"value":{"sum":1,"count":1,"min":1,"max":1,"sumsqr":1}},
{"key":["2013-01","South","Stationary","Pen"],"value":{"sum":0.5,"count":1,"min":0.5,"max":0.5,"sumsqr":0.25}},
{"key":["2013-02","South","Food","Biscuit"],"value":{"sum":7,"count":1,"min":7,"max":7,"sumsqr":49}}
]}
For the "Qty" variable, the query would be <couchdb>/<database>/_design/ddoc/_view/qty?group_level=4
.
With the group_level
property you can control over which levels the aggregation is to be performed. For example, querying with group_level=2
will aggregate up to "Period" and "Region".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With