Does anyone have a good suggestion as to what database I should use, to achieve replication across a variable number of targets? I have a mesh network of Raspberry Pi servers, each of which can contain a database. I want the contents of each database to be replicated across the network, but I can't guarantee what nodes are available at any point in time.
Most nosql databases (CouchDB, Cassandra for example) appear to only support defined targets in the configuration.
So (assuming nosql is the best database option); is there a nosql database that can replicate to variable number of targets?
For this scenario I would recommend the Hadoop Distributed File System (HDFS).
Features that make HDFS attractive to your scenario:
As for the actual database... HBase, Mongo, or Cassandra are all good options here, pick whatever you are most comfortable with -- HDFS will take care of all of the replication for you.
According to this SO response:
https://stackoverflow.com/a/8787999/2020565
And upon cheking their website, maybe you should check Elliptics: http://www.ioremap.net/projects/elliptics/
The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage. Small to medium sized write benchmarks can be found on eblob page.
In my experience Elasticsearch has great and easy-to-use cluster management, it supports out of the box nice features such as node autodiscovery, data replication, auto-rebalancing etc., have a look at docs. Usually it is used to replicate data from an other database to make it searchable but I don't see why it couldn't be used in this context as well.
Basically when you create a "table" (called "index" in ES) you get to decide that in how many "partitions" (called "shards") the data should be partitioned, and ad-hoc set that how many replicas of that table you want to have (this doesn't 100% match the correct terminology since an "index" can consist of multiple "types" but I think this is the best analogy).
An example project with three Pis is here.
I have read a bit about Cassandra as well and I imagine it would have similar features, for example partitions and replicas are mentioned here.
I'd recommend taking a look at Hazelcast. They do pretty good in memory replication across a cluster that might change. You'd have to write a custom client to store the data into a local database of your choice if you want disk backed persistence, but Hazelcast can take care of replication across a cluster in memory and has a lot of flexibility.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With