MySQL Partitioning / Sharding / Splitting - which way to go?

Tags:

We have an InnoDB database that is about 70 GB and we expect it to grow to several hundred GB in the next 2 to 3 years. About 60 % of the data belong to a single table. Currently the database is working quite well as we have a server with 64 GB of RAM, so almost the whole database fits into memory, but we’re concerned about the future when the amount of data will be considerably larger. Right now we’re considering some way of splitting up the tables (especially the one that accounts for the biggest part of the data) and I’m now wondering, what would be the best way to do it.

The options I’m currently aware of are

Using MySQL Partitioning that comes with version 5.1
Using some kind of third party library that encapsulates the partitioning of the data (like hibernate shards)
Implementing it ourselves inside our application

Our application is built on J2EE and EJB 2.1 (hopefully we’re switching to EJB 3 some day).

What would you suggest?

EDIT (2011-02-11):
Just an update: Currently the size of the database is 380 GB, the data size of our "big" table is 220 GB and the size of its index is 36 GB. So while the whole table does not fit in memory any more, the index does.
The system is still performing fine (still on the same hardware) and we're still thinking about partitioning the data.

EDIT (2014-06-04): One more update: The size of the whole database is 1.5 TB, the size of our "big" table is 1.1 TB. We upgraded our server to a 4 processor machine (Intel Xeon E7450) with 128 GB RAM. The system is still performing fine. What we're planning to do next is putting our big table on a separate database server (we've already done the necessary changes in our software) while simultaneously upgrading to new hardware with 256 GB RAM.

This setup is supposed to last for two years. Then we will either have to finally start implementing a sharding solution or just buy servers with 1 TB of RAM which should keep us going for some time.

EDIT (2016-01-18):

We have since put our big table in it's own database on a separate server. Currently the size ot this database is about 1.9 TB, the size of the other database (with all tables except for the "big" one) is 1.1 TB.

Current Hardware setup:

HP ProLiant DL 580
4 x Intel(R) Xeon(R) CPU E7- 4830
256 GB RAM

Performance is fine with this setup.

982

asked Sep 05 '08 13:09

sme

2 Answers

You will definitely start to run into issues on that 42 GB table once it no longer fits in memory. In fact, as soon as it does not fit in memory anymore, performance will degrade extremely quickly. One way to test is to put that table on another machine with less RAM and see how poor it performs.

First of all, it doesn't matter as much splitting out tables unless you also move some of the tables to a separate physical volume.

This is incorrect. Partioning (either through the feature in MySQL 5.1, or the same thing using MERGE tables) can provide significant performance benefits even if the tables are on the same drive.

As an example, let's say that you are running SELECT queries on your big table using a date range. If the table is whole, the query will be forced to scan through the entire table (and at that size, even using indexes can be slow). The advantage of partitioning is that your queries will only run on the partitions where it is absolutely necessary. If each partition is 1 GB in size and your query only needs to access 5 partitions in order to fulfill itself, the combined 5 GB table is a lot easier for MySQL to deal with than a monster 42 GB version.

One thing you need to ask yourself is how you are querying the data. If there is a chance that your queries will only need to access certain chunks of data (i.e. a date range or ID range), partitioning of some kind will prove beneficial.

I've heard that there is still some buggyness with MySQL 5.1 partitioning, particularly related to MySQL choosing the correct key. MERGE tables can provide the same functionality, although they require slightly more overhead.

Hope that helps...good luck!

193

answered Sep 28 '22 00:09

giltotherescue

If you think you're going to be IO/memory bound, I don't think partitioning is going to be helpful. As usual, benchmarking first will help you figure out the best direction. If you don't have spare servers with 64GB of memory kicking around, you can always ask your vendor for a 'demo unit'.

I would lean towards sharding if you don't expect 1 query aggregate reporting. I'm assuming you'd shard the whole database and not just your big table: it's best to keep entire entities together. Well, if your model splits nicely, anyway.

answered Sep 28 '22 00:09

Gary Richardson

Related questions
                            
                                Mysql: Setup the format of DATETIME to 'DD-MM-YYYY HH:MM:SS' when creating a table
                            
                                Hibernate dialect for MySQL 8?
                            
                                Using mysql concat() in WHERE clause?
                            
                                Using IS NULL or IS NOT NULL on join conditions - Theory question
                            
                                Where is the MySQL JDBC jar file in Ubuntu?
                            
                                Show values from a MySQL database table inside a HTML table on a webpage
                            
                                Creating UNIQUE constraint on multiple columns in MySQL Workbench EER diagram
                            
                                How to insert moment JS time into MySQL
                            
                                Dynamically create PHP object based on string
                            
                                Create date from day, month, year fields in MySQL
                            
                                quick selection of a random row from a large table in mysql
                            
                                What column data type should I use for storing large amounts of text or html
                            
                                Insert multiple rows with one query MySQL
                            
                                ORDER BY "ENUM field" in MYSQL
                            
                                Doctrine2 migrations migrate down and migrate from browser and not command line
                            
                                Column count of mysql.proc is wrong. Expected 20, found 16. The table is probably corrupted
                            
                                Run a mySQL query as a cron job?
                            
                                #1146 - Table 'phpmyadmin.pma_recent' doesn't exist
                            
                                Syntax error due to using a reserved word as a table or column name in MySQL
                            
                                MySQL Integer vs DateTime index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MySQL Partitioning / Sharding / Splitting - which way to go?

Tags:

mysql

database-performance

sharding

partitioning

sme

People also ask

2 Answers

giltotherescue

Gary Richardson

Recent Activity

Donate For Us