I am curious about the technology behind a search engine like torrentz.com. From what I could observe, it doesn't host any torrent files, but rather connects you to other servers that do.
What I'm interested in particularly is the strategy behind gathering and indexing all that content:
How do they collect then aggregate the data?
Is it a submission base service, where each of these servers submits its content for indexing?
Is it a crawling algorithm? If so how do you even start crawling a site like piratebay.org?
Do they have access to these other servers' databases?
My knowledge and understanding of the bittorrent protocol is not very elaborate, but the documentation that I found online pointed me more toward the processes involved in building a tracker service, which isn't exactly what I'm interested in. Any insight and recommended reading material is appreciated.
Torrenting safety and legality: In shortTorrenting itself isn't illegal, but downloading unsanctioned copyrighted material is.
Go to File > New Torrent (or, click on the 2nd icon from the left in the toolbar near the top of the Azureus window, the one that looks like a piece of paper). The Create a Torrent Wizard will appear. Options available: Use Azureus Embedded Tracker: Enable this if you wish to seed this torrent from your own tracker.
For beginning start indexing their rss feeds and gather data from it. The next step would be indexing of portal's (like Mininova, tpb, etc) pages but watch out for the fact that you can be banned (ip based) for doing so, since that would provoke huge amount of data requested from their servers (i don't think that they be too happy about that)..
That said i doubt that they have access to other server's databases, but rather it's crawling +rss.
Another thing that you can use is that when somebody make a query of an item which you don't have in qyour database, you make the query on the main bt portal's, cache the result in your db, and then display results. Then if another user make the same query (which is pretty common scenario) you can show him cached data + new data from rss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With