Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split Large table of Terabytes using MYSQL Sharding

I know that horizontal partitioning...you can create many tables.

I've seen that In a application based sharding, you will have the same database structure on multiple database servers. But it won't contain the same data.

So for example:

Users 1 - 10000: server A
Users 10001 - 20000: server B

Techniques employed to shard are the MySQL-Proxy, for example some tools (based on MySQL Proxy) is SpockProxy. We can shard manually as well. Required would be a master table, e.g.:

-------------------
| userA | server1 |
| userB | server2 |
| userC | server1 |
-------------------

But these above techniques handle at application level.. I want to solve it at DB server level..

can we do this with multiple servers transparently? This will allow Mysql tables to scale.

Create X tables on X servers, and end user gets data by simple query to single DB server?

In short i want to insert a data of 16 Terabyte in single table but i don't have such large space on single machine, so i want to install two servers each capacity of 8 terabyte. But User query to single db and get results while at backend may be sharding is used.

I also open this discussion for some other good solutions e.g. MYSQL Clustering.

Does anyone care to explain, or have a good beginner's tutorial (step-by-step) that teaches you how to partition across multiple servers?

like image 884
Imran Avatar asked Nov 11 '22 20:11

Imran


1 Answers

You need to adjust your thinking before you go forward. I don't think there is an easy way to do this on MySQL -- I am sure you can do it if you put in the effort using the FEDERATED table and views However, rdbs sharding at best is never easy.

Sharding however is very had. Sharding tables is almost always the wrong way to look at it. Instead you really need to shard data sets. This is because joins across nodes are expensive.

So I highly recommend going back to the drawing board on this. If you really have no need for joins, look at other dbs like Cassandra which support this sort of thing out of the box. If you do need joins, however, you really need to look at every table in your database and find good partition criteria, then partition on that so you have the same db schema and then different shards.

Once you have that in place then you put a proxy in front of your databases to handle routing queries appropriately. See https://github.com/flike/kingshard as one possibility (though as a disclaimer I have not worked with these on MySQL). With the proxy, you get the appearance to your app of a single db and as I read your question that's really what you are aiming for.

like image 72
Chris Travers Avatar answered Nov 15 '22 08:11

Chris Travers