Database replication on Raspberry Pi mesh network

Question

Does anyone have a good suggestion as to what database I should use, to achieve replication across a variable number of targets? I have a mesh network of Raspberry Pi servers, each of which can contain a database. I want the contents of each database to be replicated across the network, but I can't guarantee what nodes are available at any point in time.

Most nosql databases (CouchDB, Cassandra for example) appear to only support defined targets in the configuration.

So (assuming nosql is the best database option); is there a nosql database that can replicate to variable number of targets?

Andy Guibert · Accepted Answer

For this scenario I would recommend the Hadoop Distributed File System (HDFS).

Features that make HDFS attractive to your scenario:

It is a distributed file system with variable replication factor (the default is 3 which is nearly impossible to lose data with).
Can scale up to thousands of different machines
Does not depend on high availability of individual nodes -- automatically handles node failure and replicates any data from downed nodes

As for the actual database... HBase, Mongo, or Cassandra are all good options here, pick whatever you are most comfortable with -- HDFS will take care of all of the replication for you.

noderman · Answer

According to this SO response:

https://stackoverflow.com/a/8787999/2020565

And upon cheking their website, maybe you should check Elliptics: http://www.ioremap.net/projects/elliptics/

The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage. Small to medium sized write benchmarks can be found on eblob page.

NikoNyrh · Answer

In my experience Elasticsearch has great and easy-to-use cluster management, it supports out of the box nice features such as node autodiscovery, data replication, auto-rebalancing etc., have a look at docs. Usually it is used to replicate data from an other database to make it searchable but I don't see why it couldn't be used in this context as well.

Basically when you create a "table" (called "index" in ES) you get to decide that in how many "partitions" (called "shards") the data should be partitioned, and ad-hoc set that how many replicas of that table you want to have (this doesn't 100% match the correct terminology since an "index" can consist of multiple "types" but I think this is the best analogy).

An example project with three Pis is here.

I have read a bit about Cassandra as well and I imagine it would have similar features, for example partitions and replicas are mentioned here.

Josh Wilson · Answer

I'd recommend taking a look at Hazelcast. They do pretty good in memory replication across a cluster that might change. You'd have to write a custom client to store the data into a local database of your choice if you want disk backed persistence, but Hazelcast can take care of replication across a cluster in memory and has a lot of flexibility.

Database replication on Raspberry Pi mesh network

Tags:

database

nosql

database-replication

raspberry-pi2

Geoff Munn

4 Answers

Andy Guibert

noderman

NikoNyrh

Josh Wilson

Recent Activity

Donate For Us

Database replication on Raspberry Pi mesh network

Tags:

database

nosql

database-replication

raspberry-pi2

Geoff Munn

4 Answers

Andy Guibert

noderman

NikoNyrh

Josh Wilson

Related questions

Recent Activity

Donate For Us