Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make inverted index search faster?

I am designing an architecture of full-text search engine. One of the points is processing queries among large datasets with few response time. One thing I could figure out is that to split the inverted index into partitions. There are 2 strategies for this: term-based partition and document-based partition. But I really want to know if there is any other way to make inverted search faster among large datasets?

like image 484
Mickey Shine Avatar asked Dec 27 '22 07:12

Mickey Shine


1 Answers

This video is a speech with Shay Banon, the developer of ElasticSearch a distributed full-text search engine. In the video he discusses the pros and cons of term-based partition and document-based partition.

Basically, term-based partition produces too much network bandwidth between processes/nodes. And it is harder to implement nicely. Document-based is extremely simpler to implement and produce results.

Moreover, in this lecture by Jeffrey Dean he also explains the differences and says that Google uses document-based partition.

This is the two main ways to distribute your search engine. I'm not aware of other ways of doing it. Anyway you may want to search the Information Retrieval literature for novel work on the subject.

like image 119
Felipe Hummel Avatar answered Jan 06 '23 15:01

Felipe Hummel