Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Atomic transactions in key-value stores

Please excuse any mistakes in terminology. In particular, I am using relational database terms.

There are a number of persistent key-value stores, including CouchDB and Cassandra, along with plenty of other projects.

A typical argument against them is that they do not generally permit atomic transactions across multiple rows or tables. I wonder if there's a general approach would would solve this issue.

Take for example the situation of a set of bank accounts. How do we move money from one bank account to another? If each bank account is a row, we want to update two rows as part of the same transaction, reducing the value in one and increasing the value in another.

One obvious approach is to have a separate table which describes transactions. Then, moving money from one bank account to another consists of simply inserting a new row into this table. We do not store the current balances of either of the two bank accounts and instead rely on summing up all the appropriate rows in the transactions table. It is easy to imagine that this would be far too much work, however; a bank may have millions of transactions a day and an individual bank account may quickly have several thousand 'transactions' associated with it.

A number (all?) of key-value stores will 'roll back' an action if the underlying data has changed since you last grabbed it. Possibly this could be used to simulate atomic transactions, then, as you could then indicate that a particular field is locked. There are some obvious issues with this approach.

Any other ideas? It is entirely possible that my approach is simply incorrect and I have not yet wrapped my brain around the new way of thinking.

like image 881
ChrisInEdmonton Avatar asked Jul 07 '09 15:07

ChrisInEdmonton


People also ask

Do key-value stores support ACID transactions?

Lack of ACID supportMany key-value databases don't provide support for ACID to improve their scalability.

What is meant by atomic transaction?

An atomic transaction is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs. A guarantee of atomicity prevents updates to the database occurring only partially, which can cause greater problems than rejecting the whole series outright.

Why is atomic transaction important?

Essentially, an atomic transaction ensures that any commit you make finishes the entire operation successfully. Or, in cases of a lost connection in the middle of an operation, the database is rolled back to its state prior to the commit being initiated.

Which is the best example for key-value store?

Examples of Popular Key-Value DatabasesAmazon DynamoDB: Probably the most widely used key-value store database, in fact, it was the research into DynamoDB that really started making NoSQL really popular. Aerospike: Open-source database that is optimized for in-memory storage.


2 Answers

If, taking your example, you want to atomically update the value in a single document (row in relational terminology), you can do so in CouchDB. You will get a conflict error when you try to commit the change if an other contending client has updated the same document since you read it. You will then have to read the new value, update and re-try the commit. There is an indeterminate (possibly infinite if there is a lot of contention) number of times you may have to repeat this process, but you are guaranteed to have a document in the database with an atomically updated balance if your commit ever succeeds.

If you need to update two balances (i.e. a transfer from one account to an other), then you need to use a separate transaction document (effectively another table where rows are transactions) that stores the amount and the two accounts (in and out). This is a common bookkeeping practice, by the way. Since CouchDB computes views only as needed, it is actually still very efficient to compute the current amount in an account from the transactions that list that account. In CouchDB, you would use a map function that emitted the account number as key and the amount of the transaction (positive for incoming, negative for outgoing). Your reduce function would simply sum the values for each key, emitting the same key and total sum. You could then use a view with group=True to get the account balances, keyed by account number.

like image 188
Barry Wark Avatar answered Sep 29 '22 18:09

Barry Wark


CouchDB isn't suitable for transactional systems because it doesn't support locking and atomic operations.

In order to complete a bank transfer you must do a few things:

  1. Validate the transaction, ensuring there are sufficient funds in the source account, that both accounts are open, not locked, and in good standing, and so on
  2. Decrease the balance of the source account
  3. Increase the balance of the destination account

If changes are made in between any of these steps the balance or status of the accounts, the transaction could become invalid after it is submitted which is a big problem in a system of this kind.

Even if you use the approach suggested above where you insert a "transfer" record and use a map/reduce view to calculate the final account balance, you have no way of ensuring that you don't overdraw the source account because there is still a race condition between checking the source account balance and inserting the transaction where two transactions could simultaneous be added after checking the balance.

So ... it's the wrong tool for the job. CouchDB is probably good at a lot of things, but this is something that it really can not do.

EDIT: It's probably worth noting that actual banks in the real world use eventual consistency. If you overdraw your bank account for long enough you get an overdraft fee. If you were very good you might even be able to withdraw money from two different ATMs at almost the same time and overdraw your account because there's a race condition to check the balance, issue the money, and record the transaction. When you deposit a check into your account they bump the balance but actually hold those funds for a period of time "just in case" the source account doesn't really have enough money.

like image 5
Dobes Vandermeer Avatar answered Sep 29 '22 16:09

Dobes Vandermeer