How does "DHT search engine" work?

Tags:

I'm interested in the Btdigg.org which is called a "DHT search engine". According to this article, it doesn't store any content and even has no database. Then how does it work? Doesn't it need to gather meta infos and store them in database like other normal search engines? After a user submit a query, it scans the DHT network and return the results in "real time"? Is this possible?

409

asked Jan 30 '13 11:01

user2025043

2 Answers

I don't have specific insight into BTDigg, but I believe the claim that there is not database (or something that acts like a database) is a false statement. The author of that article might have been referring to something more specific that you might encounter in a traditional torrent site, where actual .torrent files are stored for instance.

This is how a BTDigg-like site works:

You run a bunch of DHT nodes, specifically with the purpose of "eaves dropping" on DHT traffic, to be introduced to info-hashes that people talk about.
join those swarms and download the metadata (.torrent file) by using the ut_metadata extension
index the information you find in there, map it to the info-hash
Provide a front-end for that index

If you want to luxury it up a bit you can also periodically scrape the info-hashes you know about to gather stats over time and maybe also figure out when swarms die out and should be removed from the index.

So, the claim that you don't store .torrent files nor any content is true.

It is not realistic to search the DHT in real-time, because the DHT is not organized around keyword searches, you need to build and maintain the index continuously, "in the background".

EDIT:

Since this answer, an optimization (BEP 51) has been implemented in some DHT clients that lets you query which info-hashes they are hosting, significantly reducing the cost of indexing.

172

answered Oct 12 '22 22:10

Arvid

For a deep understanding of DHT and its applications, see Scott Wolchok's paper and presentation "Crawling BitTorrent DHTs for Fun and Profit". He presents the autonomous search engine idea as a sidenote to his study of DHT's security:

PDF of his paper:

https://www.usenix.org/legacy/event/woot10/tech/full_papers/Wolchok.pdf

His presentation at DEFCON 18 (parts 1 & 2)

http://www.youtube.com/watch?v=v4Q_F4XmNEc
http://www.youtube.com/watch?v=mO3DfLtKPGs

answered Oct 12 '22 23:10

martinwguy

Related questions
                            
                                Lucene: how to boost some specific field
                            
                                what is the fastest substring search method in Java
                            
                                How to 301 redirect in ASP.NET 4.0?
                            
                                Precision recall in lucene java
                            
                                Diversified results on Elasticsearch search
                            
                                Why google webmaster tools don't see the static version of my site but instead the template for the dynamic one?
                            
                                Source code search with Google Desktop
                            
                                Meta Search Engine Architecture
                            
                                Ruby on Rails, How to determine if a request was made by a robot or search engine spider?
                            
                                inline SVG - best alternative to "alt" tag normally used for < img > SEO? [duplicate]
                            
                                how to configure the synonyms_path in elasticsearch
                            
                                How to evaluate a search/retrieval engine using trec_eval?
                            
                                API alternative to Google trends [closed]
                            
                                How does a search engine rank millions of pages within 1 second?
                            
                                SOLR Permissions / Filtering Results depending on Access Rights
                            
                                Is it possible to link directly to Google search results using href?
                            
                                ElasticSearch: search inside the array of objects
                            
                                how to prevent staging to be indexed in search engines
                            
                                How to download google image search results in Python
                            
                                Can search engines index JavaScript generated web pages?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With