Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Vs MongoDB aggregation performance

I'm currently testing some databases to my application. The main functionality is data aggregation (similar to this guy here: Data aggregation mongodb vs mysql).

I'm facing the same problem. I've created a sample test data. No joins on the mysql side, it's a single innodb table. It's a 1,6 milion rows data set and I'm doing a sum and a count on the full table, without any filter, so I can compare the performance of the aggregation engine of each one. All data fits in memory in both cases. In both cases, there is no write load.

With MySQL (5.5.34-0ubuntu0.12.04.1) I'm getting results always around 2.03 and 2.10 seconds. With MongoDB (2.4.8, linux 64bits) I'm getting results always between 4.1 and 4.3 seconds.

If I do some filtering on indexed fields, MySQL result time drops to around 1.18 and 1.20 (the number of rows processed drops to exactly half the dataset). If I do the same filtering on indexed fields on MongoDB, the result time drops only to around 3.7 seconds (again processing half the dataset, which I confirmed with an explain on the match criteria).

My conclusion is that: 1) My documents are extremely bad designed (truly can be), or 2) The MongoDB aggregation framework realy does not fit my needs.

The questions are: what can I do (in terms of especific mongoDB configurations, document modeling, etc) to make Mongo's results faster? Is this a case where MongoDB is not suited to?

My table and documento schemas:

| events_normal |

CREATE TABLE `events_normal` (
  `origem` varchar(35) DEFAULT NULL,
  `destino` varchar(35) DEFAULT NULL,
  `qtd` int(11) DEFAULT NULL,
  KEY `idx_orides` (`origem`,`destino`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

{
    "_id" : ObjectId("52adc3b444ae460f2b84c272"),
    "data" : {
        "origem" : "GRU",
        "destino" : "CGH",
        "qtdResultados" : 10
    }
}

The indexed and filtered fields when mentioned are "origem" and "destino".

select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal group by origem, destino;
select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal where origem="GRU" group by origem, destino;

db.events.aggregate( {$group: {         _id: {origem: "$data.origem", destino: "$data.destino"},         total: {$sum: "$data.qtdResultados" },         qtd: {$sum: 1}     }  } )
db.events.aggregate( {$match: {"data.origem":"GRU" } } , {$group: {         _id: {origem: "$data.origem", destino: "$data.destino"},         total: {$sum: "$data.qtdResultados" },         qtd: {$sum: 1}     }  } )

Thanks!

like image 746
Marcos Vinícius da Silva Avatar asked Oct 03 '22 03:10

Marcos Vinícius da Silva


1 Answers

Aggregation is not really what MongoDB was originally designed for, so it's not really its fastest feature.

When you really want to use MongoDB, you could use sharding so that each shard can process its share of the aggregation (make sure to select the shard-key in a way that each group is on only one cluster, or you will achieve the opposite). This, however, wouldn't be a fair comparison to MySQL anymore because the MongoDB cluster would use a lot more hardware.

like image 128
Philipp Avatar answered Oct 11 '22 22:10

Philipp