I've found this discussion: MongoDB: Terrible MapReduce Performance. Basically it says try to avoid Mongo's MR queries as it single-threaded and not supposed to be for real-time at all. 2 years has passed, and I wonder what has been changed since the time. Now we have MongoDb 2.2. I heard MRs are now multi-threaded. Please share your ideas over MR usage for real-time requests like fetching data for web application frequent http requests. Is it able to effectively use indexes?
In MongoDB, map-reduce is a data processing programming model that helps to perform operations on large data sets and produce aggregated results. MongoDB provides the mapReduce() function to perform the map-reduce operations. This function has two main functions, i.e., map function and reduce function.
As per the MongoDB documentation, Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. MongoDB uses mapReduce command for map-reduce operations. MapReduce is generally used for processing large data sets.
MapReduce facilitates concurrent processing by splitting petabytes of data into smaller chunks, and processing them in parallel on Hadoop commodity servers. In the end, it aggregates all the data from multiple servers to return a consolidated output back to the application.
MapReduce of MongoDB is based on JavaScript using the SpiderMonkey engine and the queries are executed in a single thread. On the other hand, Aggregation Pipeline queries run on compiled C++ code which makes them faster as it is not interpreted like JavaScript.
Here is the current state of functionality for Map/Reduce in MongoDB
1) Most of the performance limitations for Map/Reduce still remain in MongoDB version 2.2. The Map/Reduce engine still requires that every record get converted from BSON to JSON, the actual calculations are performed using the embedded JavaScript engine (which is slow), and there still is a single global JavaScript lock, which only allows a single JavaScript thread to run at a single time.
There have been some incremental improvements to Map/Reduce for sharded clusters. Most notably, the final Reduce operation is now distributed across multiple shards, and the output is also sharded in parallel.
I would not recommend Map/Reduce for real-time aggregation in MongoDB version 2.2
2) Starting with MongoDB 2.2, there is now a new Aggregation Framework. This is a new implementation of aggregation operations, written in C++, and tightly integrated into the MongoDB framework.
Most Map/Reduce jobs can be rewritten to use the Aggregation Framework. They usually run faster (20x speed improvement vs. Map/Reduce is common in version 2.2), they make full use of the existing query engine, and you can run multiple Aggregation commands in parallel.
If you have real-time aggregation requirements, the first place to start is with the Aggregation Framework. For more information about the aggregation framework, take a look at these links:
3) There have been significant improvements in Map/Reduce in MongoDB version 2.4. The SpiderMonkey JavaScript engine has been replaced by the V8 JavaScript engine, and there is no longer a global JavaScript lock, which means that multiple Map/Reduce threads can run concurrently.
The Map/Reduce engine is still considerably slower than the aggregation framework, for two main reasons:
The JavaScript engine is interpreted, while the Aggregation Framework runs compiled C++ code
The JavaScript engine still requires that every document being examined get converted from BSON to JSON; if you're saving the output in a collection, the result set must then be converted from JSON back to BSON
There are no significant changes in Map/Reduce between 2.4 and 2.6.
I still do not recommend using the Map/Reduce for real-time aggregation in MongoDB version 2.4 or 2.6.
4) If you really need Map/Reduce, you can also look at the Hadoop Adaptor. There's more information here:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With