Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to take the average of big data in MongoDB vs CouchDB?

I'm looking at this chart...

http://www.mongodb.org/display/DOCS/MongoDB,+CouchDB,+MySQL+Compare+Grid

...which says:

Query Method

CouchDB - Map/reduce of javascript functions to lazily build an index per query

MongoDB - Dynamic; object-based query language

What exactly does this mean? For example, if I want to take an average of 1,000,000,000 values, does CouchDB automatically do it in a MapReduce way?

Can someone walk me through how to take an average of 1,000,000,000 values with both systems... this would be a very illuminating example.

Thanks.

like image 515
Geoff Avatar asked Jul 13 '11 05:07

Geoff


People also ask

Is CouchDB better than MongoDB?

MongoDB is faster than CouchDB. MongoDB provides faster read speeds. It follows the Map/Reduce query method. It follows Map/Reduce creating a collection and object-based query language.

What is difference between CouchDB vs MongoDB?

CouchDB accepts queries via a RESTful HTTP API, while MongoDB uses its own query language. CouchDB prioritizes availability, while MongoDB prioritizes consistency. MongoDB has a much larger user base than CouchDB, making it easier to find support and hire employees for this database solution.

Is MongoDB fast for big data?

It can process large amounts of real-time data very quickly because of in-memory calculations. MongoDB: MongoDB is a NoSQL database. It has a flexible schema. MongoDB stores huge amounts of data in a naturally traversable format, making it a good choice to store, query, and analyze big data.

Why should you use CouchDB?

Views in CouchDB can be used for filtering documents, retrieving data in a specific order, and creating efficient indexes so you can find documents using values within them. Once you have indexes, they can represent relationships between the documents. These view results are stored in a B-tree index structure.


1 Answers

CouchDB´s views are a strange and fascinating beast.

CouchDB does incremental map/reduce, that is to say, that once you specify your "view" it´ll work sort of like a materialized view from a relational database. It will not matter if you´re averaging 3 or 3 billion documents. The result is there.

But there is a threefold gotcha in there

1) querying is fast once the view is created and is updated. View creation can be slow if you have lots of small documents (if possible go with fatter documents). Once the view is created, the intermediary reduction steps are stored inside the B-tree nodes and you´ll won´t have to recompute them.

2) Views are updated lazily when you query then. To have a predictable performance, you better setup some sort of job to update them regularly. How do you Schedule Index Updates in CouchDB

3) You need to have a pretty good idea of how you´ll query your data with composite keys, ranges and grouping. CouchDB sucks at doing ad-hoc querying. http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

I´m sure someone will soon post the details of how to average 1,000,000,000 items in both databases, but you have to understand that CouchDB makes you do more upfront work in order to benefit from it´s incremental approach. It´s really something quite unique, but not really intended to scenarios when you´re doing averages or anything on ad-hoc queried data.

In Mongo, you can use either map/reduce(not incremental. It will matter whether you are averaging 3 or 3 billion documents, but mongo is considered to be blazingly fast due to its memory mapped I/O approach) or their aggregation features. http://www.mongodb.org/display/DOCS/Aggregation

like image 84
Daniel Lemos Itaborai Avatar answered Oct 21 '22 05:10

Daniel Lemos Itaborai