I'm currently delving into CouchDB, and I am puzzled by the distribution of Map-Reduce computations in views. I see a lot of resources mentioning that Map-Reduce is inherently distributed, because you can process one half of your data on server A, the other half on server B, and then reduce both results. One example would be slide 16 of this presentation:
http://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288
This seems fairly logical, but:
CouchDB does not seem to provide an API for dispatching computations to several servers. The only distribution it appears to provide is replication of the entire data set to other servers (which would then, I assume, compute their own view data).
CouchDB uses a B-Tree to store view data based on keys that are generated in the Map step of the view algorithm, which precludes appropriate partitioning of documents based on what server they should be on.
So, does CouchDB distribute Map-Reduce computations at all? Or is the Map-Reduce property used merely to cache values in the B-Tree nodes?
You are looking for BigCouch, it enables a CouchDB cluster and uses distributed MapReduce.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With