MongoDB aggregation comparison: group(), $group and MapReduce

Tags:

I am somewhat confused about when to use group(), aggregate with $group or mapreduce. I read the documentation at http://www.mongodb.org/display/DOCS/Aggregation for group(), http://docs.mongodb.org/manual/reference/aggregation/group/#_S_group for $group.. Is sharding the only situation where group() won't work? Also, I get this feeling that $group is more powerful than group() because it can be used in conjunction with other pipeline operators from aggregation framework.. How does $group compare with mapreduce? I read somewhere that it doesn't generate any temporary collection whereas mapreduce does. Is that so?
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?

EDIT:
Also, it would be great if you can point out anything new specifically in these commands since the new 2.2 release came out..

685

asked Sep 09 '12 07:09

Aafreen Sheikh

1 Answers

It is somewhat confusing since the names are similar, but the group() command is a different feature and implementation from the $group pipeline operator in the Aggregation Framework.

The group() command, Aggregation Framework, and MapReduce are collectively aggregation features of MongoDB. There is some overlap in features, but I'll attempt to explain the differences and limitations of each as at MongoDB 2.2.0.

Note: inline result sets mentioned below refer to queries that are processed in memory with results returned at the end of the function call. Alternative output options (currently only available with MapReduce) could include saving results to a new or existing collection.

`group()` Command

Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
Returns result set inline (as an array of grouped items).
Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
Current Limitations
- Will not group into a result set with more than 20,000 keys.
- Results must fit within the limitations of a BSON document (currently 16MB).
- Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
- Does not work with sharded collections.
See also: group() command examples.

MapReduce

Implements the MapReduce model for processing large data sets.
Can choose from one of several output options (inline, new collection, merge, replace, reduce)
MapReduce functions are written in JavaScript.
Supports non-sharded and sharded input collections.
Can be used for incremental aggregation over large collections.
MongoDB 2.2 implements much better support for sharded map reduce output.
Current Limitations
- A single emit can only hold half of MongoDB's maximum BSON document size (16MB).
- There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
- MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
- MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.
See also: Map/Reduce examples.

Aggregation Framework

New feature in the MongoDB 2.2.0 production release (August, 2012).
Designed with specific goals of improving performance and usability.
Returns result set inline.
Supports non-sharded and sharded input collections.
Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
Pipeline operators can be repeated as needed (for example, multiple $project or $group steps.
Current Limitations
- Results are returned inline, so are limited to the maximum document size supported by the server (16MB)
- Doesn't support as many output options as MapReduce
- Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
- Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.
See also: Aggregation Framework examples.

Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?

You generally won't find examples where it would be useful to compare all three approaches, but here are previous StackOverflow questions which show variations:

group() versus Aggregation Framework
MapReduce versus Aggregation Framework

answered Sep 21 '22 17:09

Stennie

Related questions
                            
                                Select Group by count and distinct count in same mongodb query
                            
                                Multiple join conditions using the $lookup operator
                            
                                Moongoose aggregate $match does not match id's
                            
                                MongoDB sorting
                            
                                How I can use "LIKE" operator on mongoose?
                            
                                Using Multiple Mongodb Databases with Meteor.js
                            
                                MongoDB Aggregation: Counting distinct fields
                            
                                Uninstall MongoDB on Mac OS X
                            
                                mongorestore command replace existing records?
                            
                                Using mongodump: "mongodump: command not found"
                            
                                MongoDB Diagram Design Tool [closed]
                            
                                How to choose between Cassandra, Membase, Hadoop, MongoDB, RDBMS etc.? [closed]
                            
                                Storing Directory Hierarchy in a Key-Value Data store
                            
                                Is MongoDB reliable? [closed]
                            
                                How to limit mongo query in python
                            
                                MongoDB - What about Decimal type of value?
                            
                                How to load 100 million records into MongoDB with Scala for performance testing?
                            
                                Spring data MongoDb: MappingMongoConverter remove _class
                            
                                Creating custom Object ID in MongoDB
                            
                                MongoDB only works when run as root on Ubuntu - data directory issue

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MongoDB aggregation comparison: group(), $group and MapReduce

Tags:

mongodb

mongodb-query

aggregation-framework

mapreduce

Aafreen Sheikh

People also ask

1 Answers

`group()` Command

MapReduce

Aggregation Framework

Stennie

Recent Activity

Donate For Us

MongoDB aggregation comparison: group(), $group and MapReduce

Tags:

mongodb

mongodb-query

aggregation-framework

mapreduce

Aafreen Sheikh

People also ask

1 Answers

group() Command

MapReduce

Aggregation Framework

Stennie

Related questions

Recent Activity

Donate For Us

`group()` Command