Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database replication on Raspberry Pi mesh network

Does anyone have a good suggestion as to what database I should use, to achieve replication across a variable number of targets? I have a mesh network of Raspberry Pi servers, each of which can contain a database. I want the contents of each database to be replicated across the network, but I can't guarantee what nodes are available at any point in time.

Most nosql databases (CouchDB, Cassandra for example) appear to only support defined targets in the configuration.

So (assuming nosql is the best database option); is there a nosql database that can replicate to variable number of targets?

like image 400
Geoff Munn Avatar asked Sep 11 '15 10:09

Geoff Munn


4 Answers

For this scenario I would recommend the Hadoop Distributed File System (HDFS).

Features that make HDFS attractive to your scenario:

  • It is a distributed file system with variable replication factor (the default is 3 which is nearly impossible to lose data with).
  • Can scale up to thousands of different machines
  • Does not depend on high availability of individual nodes -- automatically handles node failure and replicates any data from downed nodes

As for the actual database... HBase, Mongo, or Cassandra are all good options here, pick whatever you are most comfortable with -- HDFS will take care of all of the replication for you.

like image 142
Andy Guibert Avatar answered Oct 22 '22 09:10

Andy Guibert


According to this SO response:

https://stackoverflow.com/a/8787999/2020565

And upon cheking their website, maybe you should check Elliptics: http://www.ioremap.net/projects/elliptics/

The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage. Small to medium sized write benchmarks can be found on eblob page.

like image 22
noderman Avatar answered Oct 22 '22 09:10

noderman


In my experience Elasticsearch has great and easy-to-use cluster management, it supports out of the box nice features such as node autodiscovery, data replication, auto-rebalancing etc., have a look at docs. Usually it is used to replicate data from an other database to make it searchable but I don't see why it couldn't be used in this context as well.

Basically when you create a "table" (called "index" in ES) you get to decide that in how many "partitions" (called "shards") the data should be partitioned, and ad-hoc set that how many replicas of that table you want to have (this doesn't 100% match the correct terminology since an "index" can consist of multiple "types" but I think this is the best analogy).

An example project with three Pis is here.

I have read a bit about Cassandra as well and I imagine it would have similar features, for example partitions and replicas are mentioned here.

like image 42
NikoNyrh Avatar answered Oct 22 '22 10:10

NikoNyrh


I'd recommend taking a look at Hazelcast. They do pretty good in memory replication across a cluster that might change. You'd have to write a custom client to store the data into a local database of your choice if you want disk backed persistence, but Hazelcast can take care of replication across a cluster in memory and has a lot of flexibility.

like image 26
Josh Wilson Avatar answered Oct 22 '22 10:10

Josh Wilson