Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed database which allows custom CRDT merging

I‘m rather new to distributed databases, though I have already studied related literature (e.g. CAP theorem, CRDT) and implemented some POC to allow scaling my application horizontally.

Now I however face a challenging problem. In ordere to scale the app horizontally, communication between services is done via a distributed queue. As a background here, I do require a custom CRDT method to keep the data eventually consistent, and I do require my application to work like a cache (remotely related to REDIS).

The challenge is now that I also need to persist the data. That requires me to keep the data within the application cache and database eventually consistent. I‘ve checked Cassandra, I saw a ticket [1] where somebody tried to add functionality for custom CRDT merge functionality (which as I mentioned do require for a reason). That never made it into Cassandra, and seems to have a few issues to resolve.

What are my options, either in form of a concrete distributed database engine allowing custom merging, or an algorithm that could help solve the problem (e.g. in form of a db trigger or something like this).

[1] https://issues.apache.org/jira/browse/CASSANDRA-6412

like image 914
benjist Avatar asked Nov 15 '25 12:11

benjist


2 Answers

As far as I know, there are very few databases that allow you to specify your own custom conflict resolution algorithms. Tbh. the only one I really found - disclaimer: I'm not a Microsoft Advocate - is Azure CosmosDB. It has MongoDB-compatible API and can be configured to use master-master replication strategy, where you need to specify your own conflict resolution algorithm (using JavaScript). You can use it to define your own merge operation.

If you'll take a look outside of database-native solutions into application-level ones, there are several tools, like ie. Akka (available in both JVM or .NET version) which enables you to write custom CRDTs inside of distributed-data module. JVM version additionally supports multi-datacenter persistence, which is conceptually closer to how commutative CRDTs work and can be integrated with Cassandra backend.

like image 81
Bartosz Sypytkowski Avatar answered Nov 18 '25 14:11

Bartosz Sypytkowski


I've implemented a MerkleClock CRDT at my merkle-crdt repository.

You could use an approach that when you update the database record column, you fetch the column's value and then you merge it with your CRDT of your current state and then when you save, you serialise the CRDT as JSON and store it in the database.

like image 25
Samuel Squire Avatar answered Nov 18 '25 14:11

Samuel Squire



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!