I realise that MongoDB, by it's very nature, doesn't and probably never will support these kinds of transactions. However, I have found that I do need to use them in a somewhat limited fashion, so I've come up with the following solution, and I'm wondering: is this the best way of doing it, and can it be improved upon? (before I go and implement it in my app!) Obviously the transaction is controlled via the application (in my case, a Python web app). For each document in this transaction (in any collection), the following fields are added: <pre class="prettyprint"><code>'lock_status': bool (true = locked, false = unlocked), 'data_old': dict (of any old values - current values really - that are being changed), 'data_new': dict (of values replacing the old (current) values - should be an identical list to data_old), 'change_complete': bool (true = the update to this specific document has occurred and was successful), 'transaction_id': ObjectId of the parent transaction </code></pre> In addition, there is a <code>transaction</code> collection which stores documents detailing each transaction in progress. They look like: <pre class="prettyprint"><code>{ '_id': ObjectId, 'date_added': datetime, 'status': bool (true = all changes successful, false = in progress), 'collections': array of collection names involved in the transaction } </code></pre> And here's the logic of the process. Hopefully it works in such a way that if it's interupted, or fails in some other way, it can be rolled back properly. 1: Set up a <code>transaction</code> document 2: For each document that is affected by this transaction: <ul> <li>Set <code>lock_status</code> to <code>true</code> (to 'lock' the document from being modified)</li> <li>Set <code>data_old</code> and <code>data_new</code> to their old and new values</li> <li>Set <code>change_complete</code> to <code>false</code> </li> <li>Set <code>transaction_id</code> to the ObjectId of the <code>transaction</code> document we just made</li> </ul> 3: Perform the update. For each document affected: <ul> <li>Replace any affected fields in that document with the <code>data_new</code> values</li> <li>Set <code>change_complete</code> to <code>true</code> </li> </ul> 4: Set the <code>transaction</code> document's <code>status</code> to <code>true</code> (as all data has been modified successfully) 5: For each document affected by the transaction, do some clean up: <ul> <li>remove the <code>data_old</code> and <code>data_new</code>, as they're no longer needed</li> <li>set <code>lock_status</code> to <code>false</code> (to unlock the document)</li> </ul> 6: Remove the <code>transaction</code> document set up in step 1 (or as suggested, mark it as complete) <hr> I think that logically works in such a way that if it fails at any point, all data can be either rolled back or the transaction can be continued (depending on what you want to do). Obviously all rollback/recovery/etc. is performed by the application and not the database, by using the <code>transaction</code> documents and the documents in the other collections with that transaction_id. Is there any glaring error in this logic that I've missed or overlooked? Is there a more efficient way of going about it (e.g. less writing/reading from the database)?

As a generic response multi-document commits on MongoDB can be performed as two phase commits, which have been somewhat extensively documented in the manual (See: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/). The pattern suggested by the manual is briefly to following: <ul> <li>Set up a separate <code>transactions</code> collection, that includes target document, source document, value and state (of the transaction)</li> <li>Create new transaction object with <code>initial</code> as the <code>state</code> </li> <li>Start making a transaction and update <code>state</code> to <code>pending</code> </li> <li>Apply transactions to both documents (target, source)</li> <li>Update transaction state to <code>committed</code> </li> <li>Use find to determine whether documents reflect the transaction state, if ok, update transaction state to <code>done</code> </li> </ul> In addition: <ul> <li>You need to manually handle failure scenarios (something didn't happen as described below)</li> <li>You need to manually implement a rollback, basically by introducing a name <code>state</code> value <code>canceling</code> </li> </ul> Some specific notes for your implementation: <ul> <li>I would discourage you from adding fields like <code>lock_status</code>, <code>data_old</code>, <code>data_new</code> into source/target documents. These should be properties of the transactions, not the documents themselves. </li> <li>To generalize the concept of target/source documents, I think you could use <code>DBref</code>s: http://www.mongodb.org/display/DOCS/Database+References </li> <li>I don't like the idea of deleting transaction documents when they are done. Setting state to <code>done</code> seems like a better idea since this allows you to later debug and find out what kind of transactions have been performed. I'm pretty sure you won't run out of disk space either (and for this there are solutions as well).</li> <li>In your model how do you guarantee that everything has been changed as expected? Do you inspect the changes somehow?</li> </ul>

Multi-collection, multi-document 'transactions' in MongoDB

I realise that MongoDB, by it's very nature, doesn't and probably never will support these kinds of transactions. However, I have found that I do need to use them in a somewhat limited fashion, so I've come up with the following solution, and I'm wondering: is this the best way of doing it, and can it be improved upon? (before I go and implement it in my app!)

Obviously the transaction is controlled via the application (in my case, a Python web app). For each document in this transaction (in any collection), the following fields are added:

'lock_status': bool (true = locked, false = unlocked),
'data_old': dict (of any old values - current values really - that are being changed),
'data_new': dict (of values replacing the old (current) values - should be an identical list to data_old),
'change_complete': bool (true = the update to this specific document has occurred and was successful),
'transaction_id': ObjectId of the parent transaction

In addition, there is a transaction collection which stores documents detailing each transaction in progress. They look like:

{
    '_id': ObjectId,
    'date_added': datetime,
    'status': bool (true = all changes successful, false = in progress),
    'collections': array of collection names involved in the transaction
}

And here's the logic of the process. Hopefully it works in such a way that if it's interupted, or fails in some other way, it can be rolled back properly.

1: Set up a transaction document

2: For each document that is affected by this transaction:

Set lock_status to true (to 'lock' the document from being modified)
Set data_old and data_new to their old and new values
Set change_complete to false
Set transaction_id to the ObjectId of the transaction document we just made

3: Perform the update. For each document affected:

Replace any affected fields in that document with the data_new values
Set change_complete to true

4: Set the transaction document's status to true (as all data has been modified successfully)

5: For each document affected by the transaction, do some clean up:

remove the data_old and data_new, as they're no longer needed
set lock_status to false (to unlock the document)

6: Remove the transaction document set up in step 1 (or as suggested, mark it as complete)

I think that logically works in such a way that if it fails at any point, all data can be either rolled back or the transaction can be continued (depending on what you want to do). Obviously all rollback/recovery/etc. is performed by the application and not the database, by using the transaction documents and the documents in the other collections with that transaction_id.

Is there any glaring error in this logic that I've missed or overlooked? Is there a more efficient way of going about it (e.g. less writing/reading from the database)?

Does MongoDB support multi-document transaction?

For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions. With distributed transactions, transactions can be used across multiple operations, collections, databases, documents, and shards.

What is a multi-document transaction?

Multi-document transactions enable applications to execute atomic operations across multiple documents. It offers "all-or-nothing" semantics to the operations. On commit, the changes made inside the transactions are persisted and if the transaction fails, all changes inside the transaction are discarded.

Can I use multiple collections inside of a database in MongoDB?

You can update a document in two collections. And the operation can be performed atomically using MongoDB Transaction. But, the update operation on each collection will be separate - it cannot be done as a “single query”.

What is multi-document in MongoDB?

The MongoDB v4.0 introduces multi-document transactions for replica sets and can be used across multiple operations, collections, and documents. The multi-document transactions provide a globally consistent view of data, and enforce all-or-nothing execution to maintain data integrity.

As a generic response multi-document commits on MongoDB can be performed as two phase commits, which have been somewhat extensively documented in the manual (See: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/).

The pattern suggested by the manual is briefly to following:

Set up a separate transactions collection, that includes target document, source document, value and state (of the transaction)
Create new transaction object with initial as the state
Start making a transaction and update state to pending
Apply transactions to both documents (target, source)
Update transaction state to committed
Use find to determine whether documents reflect the transaction state, if ok, update transaction state to done

In addition:

You need to manually handle failure scenarios (something didn't happen as described below)
You need to manually implement a rollback, basically by introducing a name state value canceling

Some specific notes for your implementation:

I would discourage you from adding fields like lock_status, data_old, data_new into source/target documents. These should be properties of the transactions, not the documents themselves.
To generalize the concept of target/source documents, I think you could use DBrefs: http://www.mongodb.org/display/DOCS/Database+References
I don't like the idea of deleting transaction documents when they are done. Setting state to done seems like a better idea since this allows you to later debug and find out what kind of transactions have been performed. I'm pretty sure you won't run out of disk space either (and for this there are solutions as well).
In your model how do you guarantee that everything has been changed as expected? Do you inspect the changes somehow?

MongoDB 4.0 adds support for multi-document ACID transactions.

Java Example:

try (ClientSession clientSession = client.startSession()) {
   clientSession.startTransaction();
   collection.insertOne(clientSession, docOne);
   collection.insertOne(clientSession, docTwo);
   clientSession.commitTransaction();
}

Note, it works for replica set. You can still have a replica set with one node and run it on local machine.

https://stackoverflow.com/a/51396785/4587961
https://docs.mongodb.com/manual/tutorial/deploy-replica-set-for-testing/

Multi-collection, multi-document 'transactions' in MongoDB

Tags:

mongodb

transactions

johneth

People also ask

2 Answers

jsalonen

SANN3

Recent Activity

Donate For Us

Multi-collection, multi-document 'transactions' in MongoDB

Tags:

mongodb

transactions

johneth

People also ask

2 Answers

jsalonen

SANN3

Related questions

Recent Activity

Donate For Us