Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Couchdb Mango Performance vs Map Reduce Views

I've just noticed that in the release notes of Couchdb 2.0, it is mentionned that Mango queries are recommended for new applications. It is also mentionned that apparently Mango indexes are from 2x to x10 faster than javascript queries which really surprised me, as such I have a number of questions :

  • Are Map/Reduce views being phased out ? I'm expecting the answer to be no since it seems to me that Mango does not cover all the use cases of Map/Reduce (the easiest example being Reduce itself), and the flexibility of this querying style seems to be more limited too. But m prefer to ask because of the recommendation :

We recommend all new apps start using Mango as a default.

  • We know that Map/Reduce views rely on B-trees, but I can't find any insight, in the doc or the mailing list regarding the magic behind Mango. Mango essentially is white magic for me at the minute. Yet I can tell that having an in-depth knowledge of how the javascript views are indexed behind the scenes was massively helpful to avoid pitfalls, naive implementations as well as to optimize performances. Does anyone have any insight on how Mango works ? Are the indexes B-trees too ? When are the indexes updated since there is no longer design documents ? Where do the performance gains come from ? (these gains are counter-intuitive to me, since in my understanding, the performance of javascript queries came from the precomputed nature of Map functions)

What I'm essentially after is on the one hand some insight regarding Mango and on the other hand, an overview of how Mango and Map/Reduce are supposed to live together in the 2.x era.

like image 326
tobiak777 Avatar asked Nov 22 '16 23:11

tobiak777


People also ask

What is views in CouchDB?

Views are the primary tool used for querying and reporting on CouchDB documents. There you'll learn how they work and how to use them to build effective applications with CouchDB. 3.2.1. Introduction to Views.

Where does the permanent views are stored in CouchDB?

A permanent view is stored as a special kind of document between the other regular documents. These special kind of documents are called 'design documents'. A permanent view can be executed by performing a GET operation with the name of the view. A temporary view, as its name implies, is not stored by CouchDB.

What is Mango in CouchDB?

Mango is a MongoDB inspired query language interface for Apache CouchDB. Mango provides a single HTTP API endpoint that accepts JSON bodies via HTTP POST. These bodies provide a set of instructions that will be handled with the results being returned to the client in the same order as they were specified.

What is emit CouchDB?

The emit(key, value) function creates an entry in our view result . One more thing: the emit() function can be called multiple times in the map function to create multiple entries in the view results from a single document, but we are not doing that yet.


2 Answers

I recently tried to switch my app over to using Mango queries, with the result of scrapping it completely and switching back to map/reduce. Here are a few of my reasons:

  1. Mango is buggy when dealing with queries that do not exactly specify the index to use. This one drove me batty for a while last weekend. If you don't specify the index, sometimes an alternate index will be selected and return no (or incorrect) results.
  2. Mango performance is not 'magic'. Many types of queries will end up doing in memory searches. Couch will select the best fit index then march through all those records in memory to fit the corner cases. Cloudant hand waves over some of these issues by saying to use 'text' based searches, which aren't available in Couchdb.
  3. As you pointed out, Mango searches simply cannot handle some types of query constructions well. I wouldn't consider my app to be overly complicated yet I ran into several situations where I could not construct a suitable Mango query for the task at hand. A major one here is searching arrays to find tags (for example, searching to see what users are members of a group). Mango cannot index array elements so resorts to doing full scans in memory.
  4. Views have some very powerful features for transformation of search results in the form of Lists. That doesn't exist in Mango.

Your mileage may vary, but just wanted to leave a warning that this is still quite new features.

like image 181
Jeff Barnes Avatar answered Sep 23 '22 04:09

Jeff Barnes


Answer from a core developer :

Some good questions. I don't think Mango will ever replace Map/Reduce completely. It is an alternative querying tool. What is great about the Mango query syntax is that it is a lot easier to understand and get started. And we can use it in a lot of places outside of just querying for documents. It can be used for replication filtering and the changes feed. We hope to soon have support for validation doc updates as well.

Underneath Mango is using erlang map/reduce. Which means it is creating a B-tree index just like map/reduce. What makes it faster is that it is using erlang/native functions to create the B-Tree instead of javascript. I wrote a blog post a long time ago about the internals of PouchDB-find [1] which is the mango syntax for PouchDB. It might help you understand a little more how the internals work. The key thing to understand is that there is a Map query part which uses the B-Tree and an in-memory filter. Ideally the less memory filtering you do the faster your query will be.

I would say that Mango is very much a work in process but the basic ground work is done. There are definitely things we can improve on. I've seen it used quite a bit when developers start a new project because its quick and simple to do basic querying, like find by email address or find all users with the name "John Rambo".

Hope that helps.

[1] http://www.redcometlabs.com/blog/2015/12/1/a-look-under-the-covers-of-pouchdb-find

like image 39
tobiak777 Avatar answered Sep 25 '22 04:09

tobiak777