Is the aggregation framework introduced in mongodb 2.2, has any special performance improvements over map/reduce? If yes, why and how and how much? (Already I have done a test for myself, and the performance was nearly same)

Every test I have personally run (including using your own data) shows aggregation framework being a multiple faster than map reduce, and usually being an order of magnitude faster. Just taking 1/10th of the data you posted (but rather than clearing OS cache, warming the cache first - because I want to measure performance of the aggregation, and not how long it takes to page in the data) I got this: MapReduce: 1,058ms Aggregation Framework: 133ms Removing the $match from aggregation framework and {query:} from mapReduce (because both would just use an index and that's not what we want to measure) and grouping the entire dataset by key2 I got: MapReduce: 18,803ms Aggregation Framework: 1,535ms Those are very much in line with my previous experiments.

My benchmark: == Data Generation == Generate 4million rows (with python) easy with approximately 350 bytes. Each document has these keys: <ul> <li>key1, key2 (two random columns to test indexing, one with cardinality of 2000, and one with cardinality of 20)</li> <li>longdata: a long string to increase size of each document</li> <li>value: a simple number (const 10) to test aggregation</li> </ul> <code><pre class="prettyprint"> db = Connection('127.0.0.1').test # mongo connection random.seed(1) for _ in range(2): key1s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(10)] key2s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(1000)] baddata = 'some long date ' + '*' * 300 for i in range(2000): data_list = [{ 'key1': random.choice(key1s), 'key2': random.choice(key2s), 'baddata': baddata, 'value': 10, } for _ in range(1000)] for data in data_list: db.testtable.save(data) </pre></code> Total data size was about 6GB in mongo. (and 2GB in postgres) == Tests == I did some test, but one is enough to comparing results: NOTE: Server is restarted, and OS cache is cleaned after each query, to ignore effect of caching. QUERY: aggregate all rows with <code>key1=somevalue</code> (about 200K rows) and sum <code>value</code> for each <code>key2</code> <ul> <li>map/reduce 10.6 sec</li> <li>aggreate 9.7 sec</li> <li>group 10.3 sec</li> </ul> queries: map/reduce: <code>db.testtable.mapReduce(function(){emit(this.key2, this.value);}, function(key, values){var i =0; values.forEach(function(v){i+=v;}); return i; } , {out:{inline: 1}, query: {key1: '663969462d2ec0a5fc34'} })</code> aggregate: <code>db.testtable.aggregate({ $match: {key1: '663969462d2ec0a5fc34'}}, {$group: {_id: '$key2', pop: {$sum: '$value'}} })</code> group: <code>db.testtable.group({key: {key2:1}, cond: {key1: '663969462d2ec0a5fc34'}, reduce: function(obj,prev) { prev.csum += obj.value; }, initial: { csum: 0 } })</code>

Is Mongodb Aggregation framework faster than map/reduce?

2 Answers

Every test I have personally run (including using your own data) shows aggregation framework being a multiple faster than map reduce, and usually being an order of magnitude faster.

Just taking 1/10th of the data you posted (but rather than clearing OS cache, warming the cache first - because I want to measure performance of the aggregation, and not how long it takes to page in the data) I got this:

MapReduce: 1,058ms
Aggregation Framework: 133ms

Removing the $match from aggregation framework and {query:} from mapReduce (because both would just use an index and that's not what we want to measure) and grouping the entire dataset by key2 I got:

MapReduce: 18,803ms
Aggregation Framework: 1,535ms

Those are very much in line with my previous experiments.

138

answered Sep 30 '22 21:09

Asya Kamsky

My benchmark:

== Data Generation ==

Generate 4million rows (with python) easy with approximately 350 bytes. Each document has these keys:

key1, key2 (two random columns to test indexing, one with cardinality of 2000, and one with cardinality of 20)
longdata: a long string to increase size of each document
value: a simple number (const 10) to test aggregation

 db = Connection('127.0.0.1').test # mongo connection random.seed(1) for _ in range(2):     key1s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(10)]     key2s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(1000)]     baddata = 'some long date ' + '*' * 300     for i in range(2000):         data_list = [{                 'key1': random.choice(key1s),                 'key2': random.choice(key2s),                 'baddata': baddata,                 'value': 10,                 } for _ in range(1000)]         for data in data_list:             db.testtable.save(data)

Total data size was about 6GB in mongo. (and 2GB in postgres)

== Tests ==

I did some test, but one is enough to comparing results:

NOTE: Server is restarted, and OS cache is cleaned after each query, to ignore effect of caching.

QUERY: aggregate all rows with key1=somevalue (about 200K rows) and sum value for each key2

map/reduce 10.6 sec
aggreate 9.7 sec
group 10.3 sec

queries:

map/reduce:

db.testtable.mapReduce(function(){emit(this.key2, this.value);}, function(key, values){var i =0; values.forEach(function(v){i+=v;}); return i; } , {out:{inline: 1}, query: {key1: '663969462d2ec0a5fc34'} })

aggregate:

db.testtable.aggregate({ $match: {key1: '663969462d2ec0a5fc34'}}, {$group: {_id: '$key2', pop: {$sum: '$value'}} })

group:

db.testtable.group({key: {key2:1}, cond: {key1: '663969462d2ec0a5fc34'}, reduce: function(obj,prev) { prev.csum += obj.value; }, initial: { csum: 0 } })

answered Sep 30 '22 21:09

Taha Jahangir

Related questions
                            
                                Java 11 -- performance regressions against Java 8? [duplicate]
                            
                                Is "==" in sorted array not faster than unsorted array? [duplicate]
                            
                                "Function calls are expensive" vs. "Keep functions small"
                            
                                Which CSS selectors or rules can significantly affect front-end layout / rendering performance in the real world?
                            
                                Does variable = null set it for garbage collection
                            
                                Why Python is so slow for a simple for loop?
                            
                                Is integer multiplication really done at the same speed as addition on a modern CPU?
                            
                                iOS5 What does "Discarding message for event 0 because of too many unprocessed messages" mean?
                            
                                How can I improve load performance of Angular2 apps?
                            
                                List of all unique characters in a string?
                            
                                O(log N) == O(1) - Why not?
                            
                                How do I measure how long a function is running?
                            
                                Get the second largest number in a list in linear time
                            
                                Efficient calculation of Fibonacci series
                            
                                Exceeded maximum execution time in Google Apps Script [duplicate]
                            
                                JavaScript Profiler in IE
                            
                                How to speed up pytest
                            
                                How are denormalized floats handled in C#?
                            
                                Does a C++11 range-based for loop condition get evaluated every cycle?
                            
                                Java 8 stream unpredictable performance drop with no obvious reason

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is Mongodb Aggregation framework faster than map/reduce?

Tags:

performance

mongodb

aggregation-framework

mapreduce

Taha Jahangir

People also ask

2 Answers

Asya Kamsky

Taha Jahangir

Recent Activity

Donate For Us