With sharding, how can you maintain a reliable transaction across multiple database servers?
For example, if I had a table named AccountLedger
on one database server (MySQL instance) and a table named User
on another database server, is it possible to run a transaction across both database instances that will both reliably commit, or rollback on failure?
Example transaction:
AccountLedger database server:
START TRANSACTION;
INSERT INTO AccountLedger SET
UserID = @UserID,
Date = @Date,
Debit = @Debit,
Balance = @Balance;
User database server:
START TRANSACTION;
UPDATE User SET
Balance = @Balance
WHERE UserID = @UserID;
AccountLedger database server:
COMMIT;
User database server:
COMMIT; -- What happens if the COMMIT fails here (power goes out or whatever)
I've read a quite a lot about sharding, but I can't seem to find any information on using transactions with sharding. Can someone point me in the right direction?
MySQL Cluster automatically shards (partitions) tables across nodes, enabling databases to scale horizontally on low cost, commodity hardware to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs.
Sharding enables effective scaling and management of large datasets. There are many ways to split a dataset into shards. Sharding is possible with both SQL and NoSQL databases. Some databases have out-of-the-box support for sharding.
A modern disk can do ~1000 fsyncs per second, but MySQL will group multiple writes with each fsync. An okay rule-of-thumb would be 5000-15,000 writes per second, depending on things like writes per transaction, number of indexes, hardware, size of writes, etc.
Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not. Partitioning is about grouping subsets of data within a single database instance.
It is possible to do this with distributed transactions. They are supported by the InnoDB storage engine. You will find more information about them and about the syntax of the commands in the MySQL documentation: XA Transactions
I advise against using them directly. If consistency is the most requirements for ypur application, then use a transaction monitor that can take care of it. Java EE does that for you.
However, if availability is more important than consistency, you should avoid distributed transactions. The CAP theorem explains why.
You can implement cross shard serializable transaction on the client side if each shard supports per key linearizability and compare and set (which is true for MySQL). This approach is used in Google's Percolator and in the CockroachDB but nothing prevents you from using it with MySQL.
I've created a step-by-step visualization of such transactions. I hope it will help you to understand them.
If you're fine with read committed isolation level then it makes sense to take a look on RAMP transactions by Peter Bailis. They also can be implemented in the sharded MySQL environment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With