Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forward Index vs Inverted index Why?

I was reading about inverted index (used by the text search engines like Solr, Elastic Search etc) and as I understand (if we take "Person" as an example):

The attribute to Person relationship is inverted:

John -> PersonId(1), PersonId(2), PersonId(3)
London -> PersonId(1), PersonId(2), PersonId(5)

I can now search the person records for 'John who lives in London'

Doesn't this solve all the problems? Why do we have the forward (or regular database index) at all? Or in other words, in what cases the regular indexing is useful? Please explain. Thanks.

like image 843
user1189332 Avatar asked Aug 01 '15 11:08

user1189332


People also ask

Why do we use inverted index?

The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index.

What is the difference between inverted index and forward index?

A forward index (or just index) is the list of documents, and which words appear in them. In the web search example, Google crawls the web, building the list of documents, figuring out which words appear in each page. The inverted index is the list of words, and the documents in which they appear.

What is the disadvantage using inverted index file?

Inverted Index also has disadvantage:Large storage overhead and high maintenance costs on update, delete and insert.

What are inverted files and what is the main reason they are used?

Inverted files allow fast search for statistics related to the distinct words found in a text. They are projected for using words as the search unit, which restricts their use in applications where words are not clearly defined or in applications where the system does not use words as the search unit.


2 Answers

The point that you're missing is that there is no real technical distinction between a forward index and an inverted index. "Forward" and "inverted" in this case are just descriptive terms to distinguish between:

  • A list of words contained in a document.
  • A list of documents containing a word.

The concept of an inverted index only makes sense if the concept of a regular (forward) index already exists. In the context of a search engine, a forward index would be the term vector; a list of terms contained within a particular document. The inverted index would be a list of documents containing a given term.

When you understand that the terms "forward" and "inverted" are really just relative terms used to describe the nature of the index you're talking about - and that really an index is just an index - your question doesn't really make sense any more.

like image 55
Ant P Avatar answered Sep 28 '22 08:09

Ant P


Here's an explanation of inverted index, from Elasticsearch:

Elasticsearch uses a structure called an inverted index, which is designed to allow very fast full-text searches. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears. https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html

Inverted indexing is for fast full text search. Regular indexing is less efficient, because the engine looks through all entries for a term, but very fast with indexing!

You can say this:

  • Forward index: fast indexing, less efficient query's
  • Inverted index: fast query, slower indexing

But, it's always context related. If you compare it with MySQL: myisam has fast read, innodb has fast insert/update and slower read.

Read more here: https://www.found.no/foundation/indexing-for-beginners-part3/

like image 33
schellingerht Avatar answered Sep 28 '22 07:09

schellingerht