Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between an inverted index and a plain old index?

In software engineering we create indexes all the time (e.g., in databases) but I also hear a lot of people talk about inverted indices. Is there something fundamentally different between the two? They sound like the same thing.

like image 304
guidoism Avatar asked Oct 11 '11 14:10

guidoism


People also ask

What does an inverted index do?

The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index.

Why is index called inverted?

This type of index is called an inverted index, namely because it is an inversion of the forward index. With the inverted index, we only have to look for a term once to retrieve a list of all documents containing the term.

What are the two key components of an inverted index?

The two main components of a inverted index are Dictionary and Postings Lists. For each term in a text collection, there is a posting list which contains information about the term's occurrence in the provided collection.

What is index and inverted index in Elasticsearch?

An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in. An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data.


1 Answers

One common use is "...to allow fast full-text searching."

The two types denote directionality. One takes you forward through the index, and the other takes you backward (the inverse) through the index. That's it. There's no mystery to uncover here. Otherwise the two types are identical, it's just a question of what information you have, and as a result what information you're trying to find.

To address your inquiry, I don't think there's actually a way to know why the use is what it is today. The only reason it's important to define which is forward and which one is inverted is so that we can all have a conversation about them, and everyone knows which direction we're talking about. Think about the terms "left" and "right": they are relative. Which is which doesn't matter, except that everyone needs to agree which one is "left" and which one is "right" in order for the words to have meaning. If, as a culture, we decided to flip left and right, then you'd have the same issue figuring out what a "right turn" vs a "left turn" is since the agreed upon meaning had changed. However, the naming is arbitrary, so which one is which (in and of itself) doesn't matter - what matters is that we all agree on the meaning.

In your comment where you ask, "please don't just define the terms", you're missing the point, and I think you're just getting hung up on the wording when there is absolutely no difference between them.


For the benefit of future readers, I will now provide several "forward" and "inverted" index examples:

Example 1: Web search

If you're thinking that the inverse of an index is something like the inverse of a function in mathematics, where the inverse is a special thing that has a different form, then you're mistaken: that's not the case here.

In a search engine you have a list of documents (pages on web sites), where you enter some keywords and get results back.

A forward index (or just index) is the list of documents, and which words appear in them. In the web search example, Google crawls the web, building the list of documents, figuring out which words appear in each page.

The inverted index is the list of words, and the documents in which they appear. In the web search example, you provide the list of words (your search query), and Google produces the documents (search result links).

They are both indexes - it's just a question of which direction you're going. Forward is from documents->to->words, inverted is from words->to->documents.

Example 2: DNS

Another example is a DNS lookup (which takes a host name, and returns an IP address) and a reverse lookup (which takes an IP address, and gives you the host name).

Example 3: A book

The index in the back of a book is actually an inverted index, as defined by the examples above - a list of words, and where to find them in the book. In a book, the table of contents is like a forward index: it's a list of documents (chapters) which the book contains, except instead of listing the words in those sections, the table of contents just gives a name/general description of what's contained in those documents (chapters).

Example 4: Your cell phone

The forward index in your cell phone is your list of contacts, and which phone numbers (cell, home, work) are associated with those contacts. The inverted index is what allows you to manually enter a phone number, and when you hit "dial" you see the person's name, rather than the number, because your phone has taken the phone number and found you the contact associated with it.

like image 158
jefflunt Avatar answered Oct 20 '22 07:10

jefflunt