Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a distributed bittorrent-SQL database

I have an idea for a distributed SQL database using the bittorrent protocol for pulling and writing its data.

For the sake of argument, lets say this is a messaging application, where thousands of users run a program that contains a messaging window, and an input box for them to write messages.

Each message written does a INSERT to their own sqlite DB.

How it could be done

  • Download a .torrent file that essentially contains the schema/DDL for creating the DB, and create it on the local machines.
  • Anytime a 'write' action is done(like a user wants to send a message), that INSERT line(which is kinda like a delta) does two things:
    • Writes to their own internal DB
    • Creates a .torrent file out of that line, named something like, messaging-[my-ip]-[UTC_timestamp].torrent, and posts it to a tracker
  • Everyone running the app is continually scanning the tracker for files of this certain name(and possibly after a certain date), downloads the .torrent and hosts it, and runs the INSERT commands on their local DB.

What you'd then have is a ton of delta-files, all P2P hosted for redundancy, updating local .sqlite DBs on a lot of machines.

Some issues I'm having

  • How do I scrape for torrents of a certain file-name? I've read through the http bittorrent tracker spec, but you seem to only be able to query files based on their specific info name. Is there no way to query for a group of files, or based on file name?

  • How do I download a .torrent file from a tracker? Will I need to host the files on a centralized server, or can I use the tracker to download the files in some way? And if I have to host the .torrent files myself...

    • Wouldn't this defeat the purpose of a decentralized DB, since if my website goes down, the application would stop getting updates?

Thanks for the help in advance.

like image 346
thouliha Avatar asked Jan 30 '15 17:01

thouliha


People also ask

Can SQL databases be distributed?

Distributed SQL is a single logical database deployed across multiple physical nodes in a single data center or across many data centers if need be; all of which allow it to deliver elastic scale and bulletproof resilience.

Can SQL Server be distributed?

Data (including indexes) in a distributed SQL database are automatically distributed—or sharded—across multiple nodes of the cluster so that no single node becomes a bottleneck, ensuring high performance and availability.

Is SQL a distributed system?

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones.

What is stack overflow database?

Stack Overflow is a question and answer website for professional and enthusiast programmers. It is the flagship site of the Stack Exchange Network. It was created in 2008 by Jeff Atwood and Joel Spolsky. It features questions and answers on a wide range of topics in computer programming.


1 Answers

Bittorrent is designed for distribution of immutable and somewhat large data sets and doesn't really know any operations that span multiple torrents. Databases are mostly about mutating relatively small chunks of data and performing operations over diverse subsets of those.

You will have little joy trying to shoehorn database semantics into bittorent.

At best you can use it for distributing snapshots of a database.
With a little tinkering bittorrent can be fairly good at recycling data from previous torrents if the new content only adds/removes files (again, of significant size) without modifying old ones.

Anything beyond that would require some significant modifications to the protocol, it wouldn't really be vanilla bittorrent anymore.

like image 173
the8472 Avatar answered Sep 29 '22 11:09

the8472