Why is MapReduce in CouchDB called "incremental"?

Tags:

I am reading the O'Reilly CouchDB book. I am puzzled by the reduce/re-reduce/incremental-MapReduce part on page 64. Too much is left to rhetory in the O'Reilly book with the sentence

If you're interested in pushing the ede of CouchDB's incremental reduce functionality, have a look at Google's paper on Sawzall, ...

If I understand the word "incremental" correctly, it refers to some sort of addition -operation in the B-tree data structure. I cannot yet see why it is somehow special over typical map-reduce, probably not yet understanding it. In CouchDB, it mentions that there is no side-effects with map function - does that hold true with reduce too?

Why is MapReduce in CouchDB is called "incremental"?

Helper questions

Explain the quote about incremental MapReduce with Sawzall.
Why two terms for the same thing i.e. reduction? Reduce and re-reduce?

References

A Google paper about Sawzall.
Introduction to CouchDB views in the CouchDB wiki and a lot of blurry blog references.
CouchDB O'Reilly book

787

asked Jun 28 '12 00:06

hhh

1 Answers

This page that you linked explained it.

The view (which is the whole point of map reduce in CouchDB) can be updated by re-indexing only the documents that have changed since the last index update. That's the incremental part.

This can be achieved by requiring the reduce function to be referentially transparent, which means that it always returns the same output for a given input.

The reduce function also must be commutative and associative for the array value input, which means that if you run the reducer on the output of that same reducer, you will receive the same result. In that wiki page it is expressed like:

f(Key, Values) == f(Key, [ f(Key, Values) ] )

Rereduce is where you take the output from several reducer calls and run that through the reducer again. This sometimes is required because CouchDB sends stuff through the reducer in batches, so sometimes not all keys that need to be reduced will be sent through in once shot.

197

answered Sep 29 '22 17:09

nickgroenke

Related questions
                            
                                Finding number of concurrent events given start and end times
                            
                                Big O(h) vs. Big O(logn) in trees
                            
                                Complexity of PriorityQueue addAll()
                            
                                R creating an array of matrices
                            
                                What are the differences between heap and red-black tree?
                            
                                a collection data structure to keep items sorted
                            
                                Why does my code take different values when i switch the order in a set (knowing that order doesn't matter with sets)
                            
                                Important data structures in search
                            
                                Construct a binary tree such that the Post-order traversal should give the sorted result
                            
                                Incidence matrix instead of adjacency matrix
                            
                                Data structure for a random world
                            
                                Given a string, find all its permutations that are a word in dictionary
                            
                                Red Black Tree inserting: why make nodes red when inserted?
                            
                                Shared pointers delete recursive data structures recursively and the stack overflows
                            
                                Sorted hash table (map, dictionary) data structure design
                            
                                Why does the Standard Library for C++ not contain hash table implementations?
                            
                                Using macros to implement a generic vector in C. Is this a good idea?
                            
                                Number of integers in a list larger than a given integer possibly not in the list in log log time
                            
                                Creating and using a cross platform struct in C++
                            
                                Faster data structure for searching a string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is MapReduce in CouchDB called "incremental"?

Tags:

terminology

data-structures

couchdb

mapreduce

Helper questions

References

hhh

People also ask

1 Answers

nickgroenke

Recent Activity

Donate For Us