I am reading the O'Reilly CouchDB book. I am puzzled by the reduce/re-reduce/incremental-MapReduce part on page 64. Too much is left to rhetory in the O'Reilly book with the sentence
If you're interested in pushing the ede of CouchDB's incremental reduce functionality, have a look at Google's paper on Sawzall, ...
If I understand the word "incremental" correctly, it refers to some sort of addition -operation in the B-tree data structure. I cannot yet see why it is somehow special over typical map-reduce, probably not yet understanding it. In CouchDB, it mentions that there is no side-effects with map function - does that hold true with reduce too?
Why is MapReduce in CouchDB is called "incremental"?
The incremental MapReduce framework can be applied to several fields such as web crawls, PageRanking, life science computing, graph processing, text processing, machine learning, data mining, and relational data processing.
In order to retrieve data with CouchDB, we use a process called MapReduce, to create views. A view contains rows of data that is sorted by the row's key (you might use date as a key, for example, to sort your data based on the date). MapReduce is a combination of two concepts Map and Reduce.
This page that you linked explained it.
The view (which is the whole point of map reduce in CouchDB) can be updated by re-indexing only the documents that have changed since the last index update. That's the incremental part.
This can be achieved by requiring the reduce function to be referentially transparent, which means that it always returns the same output for a given input.
The reduce function also must be commutative and associative for the array value input, which means that if you run the reducer on the output of that same reducer, you will receive the same result. In that wiki page it is expressed like:
f(Key, Values) == f(Key, [ f(Key, Values) ] )
Rereduce is where you take the output from several reducer calls and run that through the reducer again. This sometimes is required because CouchDB sends stuff through the reducer in batches, so sometimes not all keys that need to be reduced will be sent through in once shot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With