Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Mongodb Aggregation framework faster than map/reduce?

Is the aggregation framework introduced in mongodb 2.2, has any special performance improvements over map/reduce?

If yes, why and how and how much?

(Already I have done a test for myself, and the performance was nearly same)

like image 497
Taha Jahangir Avatar asked Dec 17 '12 04:12

Taha Jahangir


People also ask

Is aggregation fast in MongoDB?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.

What is the difference between MapReduce and aggregation in MongoDB?

MapReduce of MongoDB is based on JavaScript using the SpiderMonkey engine and the queries are executed in a single thread. On the other hand, Aggregation Pipeline queries run on compiled C++ code which makes them faster as it is not interpreted like JavaScript.

Can MapReduce be used for aggregation?

Map-reduce operations can be rewritten using aggregation pipeline operators, such as $group , $merge , and others. For map-reduce operations that require custom functionality, MongoDB provides the $accumulator and $function aggregation operators starting in version 4.4.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.


2 Answers

Every test I have personally run (including using your own data) shows aggregation framework being a multiple faster than map reduce, and usually being an order of magnitude faster.

Just taking 1/10th of the data you posted (but rather than clearing OS cache, warming the cache first - because I want to measure performance of the aggregation, and not how long it takes to page in the data) I got this:

MapReduce: 1,058ms
Aggregation Framework: 133ms

Removing the $match from aggregation framework and {query:} from mapReduce (because both would just use an index and that's not what we want to measure) and grouping the entire dataset by key2 I got:

MapReduce: 18,803ms
Aggregation Framework: 1,535ms

Those are very much in line with my previous experiments.

like image 138
Asya Kamsky Avatar answered Sep 30 '22 21:09

Asya Kamsky


My benchmark:

== Data Generation ==

Generate 4million rows (with python) easy with approximately 350 bytes. Each document has these keys:

  • key1, key2 (two random columns to test indexing, one with cardinality of 2000, and one with cardinality of 20)
  • longdata: a long string to increase size of each document
  • value: a simple number (const 10) to test aggregation

 db = Connection('127.0.0.1').test # mongo connection random.seed(1) for _ in range(2):     key1s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(10)]     key2s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(1000)]     baddata = 'some long date ' + '*' * 300     for i in range(2000):         data_list = [{                 'key1': random.choice(key1s),                 'key2': random.choice(key2s),                 'baddata': baddata,                 'value': 10,                 } for _ in range(1000)]         for data in data_list:             db.testtable.save(data) 
Total data size was about 6GB in mongo. (and 2GB in postgres)

== Tests ==

I did some test, but one is enough to comparing results:

NOTE: Server is restarted, and OS cache is cleaned after each query, to ignore effect of caching.

QUERY: aggregate all rows with key1=somevalue (about 200K rows) and sum value for each key2

  • map/reduce 10.6 sec
  • aggreate 9.7 sec
  • group 10.3 sec

queries:

map/reduce:

db.testtable.mapReduce(function(){emit(this.key2, this.value);}, function(key, values){var i =0; values.forEach(function(v){i+=v;}); return i; } , {out:{inline: 1}, query: {key1: '663969462d2ec0a5fc34'} })

aggregate:

db.testtable.aggregate({ $match: {key1: '663969462d2ec0a5fc34'}}, {$group: {_id: '$key2', pop: {$sum: '$value'}} })

group:

db.testtable.group({key: {key2:1}, cond: {key1: '663969462d2ec0a5fc34'}, reduce: function(obj,prev) { prev.csum += obj.value; }, initial: { csum: 0 } })

like image 23
Taha Jahangir Avatar answered Sep 30 '22 21:09

Taha Jahangir