Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

B Tree Index vs Inverted Index?

Here is mine understanding about both

B Tree index :- It is generally used database column. It keeps the column content as key and row_id as value . It keeps the key in sorted fashion to quickly find the key and row location

Inverted Index :- Generally used in full text search. Here also word in document works as key, stored in sorted fashion along with doucument location/id as value.

So what's the difference b/w B tree index and Inverted index . To me they looks same

like image 736
emilly Avatar asked Nov 28 '17 17:11

emilly


People also ask

What is the difference between index and inverted index?

A forward index (or just index) is the list of documents, and which words appear in them. In the web search example, Google crawls the web, building the list of documents, figuring out which words appear in each page. The inverted index is the list of words, and the documents in which they appear.

What is B-tree index?

A B-tree index creates a multi-level tree structure that breaks a database down into fixed-size blocks or pages. Each level of this tree can be used to link those pages via an address location, allowing one page (known as a node, or internal page) to refer to another with leaf pages at the lowest level.

What is meant by inverted index?

An inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a document or a set of documents. In simple words, it is a hashmap like data structure that directs you from a word to a document or a web page.

What is the difference between B-tree and bitmap index?

1: Syntax differences: The bitmap index includes the "bitmap" keyword. The btree index does not say "bitmap". 2: Cardinality differences: The bitmap index is generally for columns with lots of duplicate values (low cardinality), while b-tree indexes are best for high cardinality columns.


1 Answers

Short answer:

  • yes, they have the same purpose - finding things fast
  • difference: what are they useful for / particularly good at
  • and btw the naming is just awfully confusing too

Long answer:

The naming

My knowledge comes from practice with SQL world, so for me the data storage used to be equal to "database" and the structure that allows to find things quick - an "index".

The trick is - search engines already call their storage "index", so how do you call that index-of-"index"? "Inverted Index", of course! Why inverted? Because, as I can already see in your question, it just inverts the the primary storage. Storage is like primary key --> values, that helper-structure inverts it to values --> primary key and helps quickly finding documents by values.

Purpose

Your question has a mix of Ideas. "Inverted index" means actually more like "a data structure that helps finding documents that are already in storage" whereas B-Tree is just an implementation of such structure.

An index could be theoretically implemented with any data structure you want. Hashes, Graphs, Trees, Arrays, Bitmaps.. it just depends on your usecase.

The differences

B-Tree is good for data that changes, so it's used e.g. in databases and filesystems. Downside: multiple indices cannot be used together in one query (I guess because this structure is dynamic and thus references back to documents are not sorted) and it's data tends to become scattered, so the IO can become an issue.

"Inverted index" uses Bitmaps/Arrays and everything's sorted (list of values and the list of references to documents). These are good for static data sets. And because of sorted nature, multiple indices can be used together. Downside: updating is not performant (new document means inserting values somewhere in a sorted list), tricks are used like keeping batches of data together as it comes in and merging into bigger batches in a background process.

like image 125
davisca Avatar answered Sep 20 '22 10:09

davisca