Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decentralized backup using torrent protocol [closed]

I'm playing with an idea of creating client that would use the torrent protocol used today in torrent download client such as uTorrrent or Vuze to create:

Client software that would:

  1. Select files you would like to backup
  2. Create torrent like descriptor files for each file
  3. Offer optional encryption of your files based on key phrase
  4. Let you select redundancy you would like to trade with other clients (Redundancy would be based on give-and-take principle. If you want to backup 100MB five times you would have to offer extra 500MB of your own storage space in your system. The file backup would not get distributed only amongst 5 clients but it would utilize as many clients as possible offering storage in exchange based on physical distance specified in settings)

Optionally:

  1. I'm thinking to include edge file sharing. If you would have non encrypted files shared in you backup storage and would prefer clients that have their port 80 open for public HTTP sharing. But this gets tricking since I have hard time coming up with simple scheme where the visitor would pick the closest backup client.

  2. Include file manager that would allow file transfers (something like FTP with GUI) style between two systems using torrent protocol.

I'm thinking about creating this as service API project (sort of like http://www.elasticsearch.org ) that could be integrated with any container such as tomcat and spring or just plain Swing.

This would be P2P open source project. Since I'm not completely confident in my understanding of torrent protocol the question is:

Is the above feasible with current state of the torrent technology (and where should I look to recruit java developers for this project)

If this is the wrong spot to post this please move it to more appropriate site.

like image 365
MatBanik Avatar asked Jan 04 '12 01:01

MatBanik


2 Answers

You are considering the wrong technology for the job. What you want is an erasure code using Vandermonde matrixes. What this allows you to do is get the same level of protection against lost data without needing to store nearly as many copies. There's an open source implementation by Luigi Rizzo that works perfectly.

What this code allows you to do is take a 8MB chunk of data and cut it into any number of 1MB chunks such that any eight of them can reconstruct the original data. This allows you to get the same level of protection as tripling the size of the data stored without even doubling the size of the data stored.

You can tune the parameters any way you want. With Luigi Rizzo's implementation, there's a limit of 256 chunks. But you can control the chunk size and the number of chunks required to reconstruct the data.

You do not need to generate or store all the possible chunks. If you cut an 80MB chunk of data into 8MB chunks such that any ten can recover the original data, you can construct up to 256 such chunks. You will likely only want 20 or so.

like image 159
David Schwartz Avatar answered Oct 14 '22 04:10

David Schwartz


You might have great difficulty enforcing the reciprocal storage feature, which I believe is critical to large-scale adoption (finally, a good use for those three terabyte drives that you get in cereal boxes!) You might wish to study the mechanisms of BitCoin to see if there are any tools you can steal or adopt for your own needs for distributed non-repudiable proof of storage.

like image 33
sarnold Avatar answered Oct 14 '22 03:10

sarnold