I'm considering using bittorrent for a large data dissemination problem where the data source is petascale and users will want up to several terabytes. Some details
I expect the number of active torrents to be small compared to the total available but quality of service is important so there must be several seeders for each torrent or some mechanism for launching new seeders.
My question is can bittorrent clients handle seeding huge numbers of torrents, most of which are idle? Would I need to stripe torrents across the seeders in a cluster or could each node be seeding all torrents it has access to? Which client would do the best job? Are there any tools for managing clusters of seeders?
I am assuming that trackers can be made to scale to this level.
The easiest way to increase your seeding ratio is by downloading smaller files and storing them on your seedbox. When you start seeding smaller-sized files, you increase the chances of seeders downloading your files.
The more seeds, the better the download rate. However, it is good to have more peers in addition to the seeders, as the downloaders can utilize both. Once a file is fully seeded, the BitTorrent application automatically stops the seeding process and the file can then be removed from the seeding list.
Seeding refers to leaving a peer's BitTorrent client open and available for additional individuals to download from. Normally, a peer should seed more data than download. However, whether to seed or not, or how much to seed, depends on the availability of downloaders and the choice of the peer at the seeding end.
From this BiTorrent FAQ: When there are zero seeds for a given torrent (and not enough peers to have a distributed copy), then eventually all the peers will get stuck with an incomplete file, if no one in the swarm has the missing pieces.
There are 2 main problems:
As for the tracker traffic, let's assume you have 1 million torrents, the typical re-announce interval is 30 minutes, but some tracker has it set to 1 hour. Let's be conservative and assume your tracker uses 1 hour announce intervals. You will have to make 1 million GET requests per hour, let's say each request is 400 bytes up and 100 bytes down (assuming most responses will not contain any peers), that's about 111 kB/s up and 28 kB/s down constantly. That's not so bad, but keep in mind that TCP requires an extra round-trip for establishing connections, so that's another 40 bytes down and 40 bytes up.
This can be mitigated by only using UDP trackers. Then you would only need a single connect-message, and you can reuse the connection ID for each announce. Each announce message would then be 100 bytes, and the returned message would be a bit more compact as well, let's assume 60 bytes. That would get you 28 kB/s up and 16kB/s down, just to keep the torrents announced. For this you would need a client with decent udp tracker support (one that caches the connection ID for instance).
Not too bad, assuming that's insignificant compared to the actual data your seeds would send.
However, you don't necessarily need to stripe your torrents across separate data centers, you could also use an HTTP server to seed the torrents. All major bittorrent clients support http seeding, and you wouldn't have to worry about announcing to the tracker (the URL is burned into the .torrent itself).
As for a client that scales well with torrents, I don't know for sure, I haven't done any measurements. It should be fairly straightforward to just generate a million random torrents and try to load it up.
I have done some optimization work in libtorrent rasterbar to make it scale well with many torrents, I haven't tried millions though.
I've written a blog post on this topic, here.
You may be looking for Hekate It's in, at best, pre-alpha right now, but it's quite nearly what you're describing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With