Map-Reduce performance in MongoDb 2.2, 2.4, and 2.6

Tags:

I've found this discussion: MongoDB: Terrible MapReduce Performance. Basically it says try to avoid Mongo's MR queries as it single-threaded and not supposed to be for real-time at all. 2 years has passed, and I wonder what has been changed since the time. Now we have MongoDb 2.2. I heard MRs are now multi-threaded. Please share your ideas over MR usage for real-time requests like fetching data for web application frequent http requests. Is it able to effectively use indexes?

946

asked Oct 01 '12 18:10

YMC

1 Answers

Here is the current state of functionality for Map/Reduce in MongoDB

1) Most of the performance limitations for Map/Reduce still remain in MongoDB version 2.2. The Map/Reduce engine still requires that every record get converted from BSON to JSON, the actual calculations are performed using the embedded JavaScript engine (which is slow), and there still is a single global JavaScript lock, which only allows a single JavaScript thread to run at a single time.

There have been some incremental improvements to Map/Reduce for sharded clusters. Most notably, the final Reduce operation is now distributed across multiple shards, and the output is also sharded in parallel.

I would not recommend Map/Reduce for real-time aggregation in MongoDB version 2.2

2) Starting with MongoDB 2.2, there is now a new Aggregation Framework. This is a new implementation of aggregation operations, written in C++, and tightly integrated into the MongoDB framework.

Most Map/Reduce jobs can be rewritten to use the Aggregation Framework. They usually run faster (20x speed improvement vs. Map/Reduce is common in version 2.2), they make full use of the existing query engine, and you can run multiple Aggregation commands in parallel.

If you have real-time aggregation requirements, the first place to start is with the Aggregation Framework. For more information about the aggregation framework, take a look at these links:

http://www.10gen.com/presentations/mongonyc-2012/new-aggregation-framework
http://docs.mongodb.org/manual/reference/aggregation/

3) There have been significant improvements in Map/Reduce in MongoDB version 2.4. The SpiderMonkey JavaScript engine has been replaced by the V8 JavaScript engine, and there is no longer a global JavaScript lock, which means that multiple Map/Reduce threads can run concurrently.

The Map/Reduce engine is still considerably slower than the aggregation framework, for two main reasons:

The JavaScript engine is interpreted, while the Aggregation Framework runs compiled C++ code
The JavaScript engine still requires that every document being examined get converted from BSON to JSON; if you're saving the output in a collection, the result set must then be converted from JSON back to BSON

There are no significant changes in Map/Reduce between 2.4 and 2.6.

I still do not recommend using the Map/Reduce for real-time aggregation in MongoDB version 2.4 or 2.6.

4) If you really need Map/Reduce, you can also look at the Hadoop Adaptor. There's more information here:

http://www.10gen.com/presentations/webinar/mongodb-hadoop-taming-elephant-room
http://api.mongodb.org/hadoop/MongoDB%2BHadoop+Connector.html
http://www.mongodb.org/display/DOCS/Hadoop+Quick+Start

answered Sep 19 '22 06:09

William Z

Related questions
                            
                                PHP / Mongo geoJSON Loop is not valid
                            
                                Unit of work in mongodb and C#
                            
                                Spring Data MongoDB: how to implement "entity relationships"?
                            
                                MongoDB structure: single collection vs multiple smaller collections
                            
                                C# MongoDB: How to correctly map a domain object?
                            
                                How do you change MongoDB user permissions?
                            
                                How to do reporting with MongoDB?
                            
                                How to return raw JSON directly from a mongodb query in Java?
                            
                                difference between aggregate ($match) and find, in MongoDB?
                            
                                Why do we need an 'arbiter' in MongoDB replication?
                            
                                Best Session Storage Middleware for Express + MongoDB
                            
                                Is MongoDB somehow limited to a single core?
                            
                                Saving numpy array in mongodb
                            
                                database dataSize in mongodb
                            
                                Append a string to the end of an existing field in MongoDB
                            
                                How does MongoDB deal with concurrent updates?
                            
                                Error: couldn't add user: not authorized on test to execute command { createUser:
                            
                                Unable to start docker mongo image on windows
                            
                                Where does MongoDB store its documents?
                            
                                Robo 3T Error : Network is unreachable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Map-Reduce performance in MongoDb 2.2, 2.4, and 2.6

Tags:

mongodb

mapreduce

YMC

People also ask

1 Answers

William Z

Recent Activity

Donate For Us