Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synchronizing Database Access in a Distributed App

A common bit of programming logic I find myself implementing often is something like the following pseudo-code:

Let X = some value
Let Database = some external Database handle

if !Database.contains(X):
   SomeCalculation()
   Database.insert(X)

However, in a multi-threaded program we have a race condition here. Thread A might check if X is in Database, find that it's not, and then proceed to call SomeCalculation(). Meanwhile, Thread B will also check if X is in Database, find that it's not, and insert a duplicate entry.

So of course, this needs to be synchronized like:

Let X = some value
Let Database = some external Database handle

LockMutex()
if !Database.contains(X):
   SomeCalculation()
   Database.insert(X)
UnlockMutex()

This is fine, except what if the application is a distributed app, running across multiple computers, all of which communicate with the same back-end database machine? In this case, a Mutex is useless, because it only synchronizes a single instance of the app with other local threads. To make this work, we'd need some kind of "global" distributed synchronization technique. (Assume that simply disallowing duplicates in Database is not a feasible strategy.)

In general, what are some practical solutions to this problem?

I realize this question is very generic, but I don't want to make this a language-specific question because this is an issue that comes up across multiple languages and multiple Database technologies.

I intentionally avoided specifying whether I'm talking about an RDBMS or SQL Database, versus something like a NoSQL Database, because again - I'm looking for generalized answers based on industry practices. For example, is this situation something that Atomic Stored Procedures might solve? Or Atomic Transactions? Or is this something that requires something like a "Distributed Mutex"? Or more generally, is this problem generally addressed by the Database system, or is it something the Application itself should handle?

If it turns out this question is impossible to answer at all without further information, please tell me so I can modify it.

like image 330
Channel72 Avatar asked Nov 05 '22 03:11

Channel72


1 Answers

One sure way to ensure against data stomping is to lock the data row. Many databases allow you to do that, via transactions. Some don't support transactions.

However, this is overkill for most cases, where contention is low in general. You might want to read up on Isolation levels to get more background on the topic.

A better general approach is often Optimistic Concurrency. The idea behind it is that each data row includes a signature, a timestamp works fine but the signature need not be time oriented. It could be a hash value, for example. This is a general concurrency management approach and is not limited to relational stores.

The app that changes data first reads the row, and then performs whatever calculations it requires, and then at some point, writes the updated data back to the data store. Via Optimistic concurrency, the app writes the update with the stipulation (expressed in SQL if it is a SQL database) that the data row must be updated only if the signature has not changed in the interim. And, each time a data row is updated, the signature must be updated as well.

The result is that updates don't get stomped on. But for a more rigorous explanation of the concurrency issues, refer to that article on DB Isolation levels.

All distributed updaters must follow the OCC convention (or something stronger, like transactional locking) in order for this to work.

like image 73
Cheeso Avatar answered Nov 09 '22 16:11

Cheeso