Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sharding and Transactions with MySQL

With sharding, how can you maintain a reliable transaction across multiple database servers?

For example, if I had a table named AccountLedger on one database server (MySQL instance) and a table named User on another database server, is it possible to run a transaction across both database instances that will both reliably commit, or rollback on failure?

Example transaction:

AccountLedger database server:

START TRANSACTION;
INSERT INTO AccountLedger SET
    UserID = @UserID,
    Date = @Date,
    Debit = @Debit,
    Balance = @Balance;

User database server:

START TRANSACTION;
UPDATE User SET
    Balance = @Balance
WHERE UserID = @UserID;

AccountLedger database server:

COMMIT;

User database server:

COMMIT; -- What happens if the COMMIT fails here (power goes out or whatever)

I've read a quite a lot about sharding, but I can't seem to find any information on using transactions with sharding. Can someone point me in the right direction?

like image 497
Mark Avatar asked Apr 07 '12 03:04

Mark


People also ask

Does MySQL allow sharding?

MySQL Cluster automatically shards (partitions) tables across nodes, enabling databases to scale horizontally on low cost, commodity hardware to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs.

Can sharding be done on SQL?

Sharding enables effective scaling and management of large datasets. There are many ways to split a dataset into shards. Sharding is possible with both SQL and NoSQL databases. Some databases have out-of-the-box support for sharding.

How many transactions can MySQL handle per second?

A modern disk can do ~1000 fsyncs per second, but MySQL will group multiple writes with each fsync. An okay rule-of-thumb would be 5000-15,000 writes per second, depending on things like writes per transaction, number of indexes, hardware, size of writes, etc.

What is the difference between partitioning and sharding?

Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not. Partitioning is about grouping subsets of data within a single database instance.


2 Answers

It is possible to do this with distributed transactions. They are supported by the InnoDB storage engine. You will find more information about them and about the syntax of the commands in the MySQL documentation: XA Transactions

I advise against using them directly. If consistency is the most requirements for ypur application, then use a transaction monitor that can take care of it. Java EE does that for you.

However, if availability is more important than consistency, you should avoid distributed transactions. The CAP theorem explains why.

like image 69
nosid Avatar answered Sep 20 '22 20:09

nosid


You can implement cross shard serializable transaction on the client side if each shard supports per key linearizability and compare and set (which is true for MySQL). This approach is used in Google's Percolator and in the CockroachDB but nothing prevents you from using it with MySQL.

I've created a step-by-step visualization of such transactions. I hope it will help you to understand them.

If you're fine with read committed isolation level then it makes sense to take a look on RAMP transactions by Peter Bailis. They also can be implemented in the sharded MySQL environment.

like image 45
rystsov Avatar answered Sep 21 '22 20:09

rystsov