Here is mine understanding about both
B Tree index :- It is generally used database column. It keeps the column content as key and row_id as value . It keeps the key in sorted fashion to quickly find the key and row location
Inverted Index :- Generally used in full text search. Here also word in document works as key, stored in sorted fashion along with doucument location/id as value.
So what's the difference b/w B tree index and Inverted index . To me they looks same
A forward index (or just index) is the list of documents, and which words appear in them. In the web search example, Google crawls the web, building the list of documents, figuring out which words appear in each page. The inverted index is the list of words, and the documents in which they appear.
A B-tree index creates a multi-level tree structure that breaks a database down into fixed-size blocks or pages. Each level of this tree can be used to link those pages via an address location, allowing one page (known as a node, or internal page) to refer to another with leaf pages at the lowest level.
An inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a document or a set of documents. In simple words, it is a hashmap like data structure that directs you from a word to a document or a web page.
1: Syntax differences: The bitmap index includes the "bitmap" keyword. The btree index does not say "bitmap". 2: Cardinality differences: The bitmap index is generally for columns with lots of duplicate values (low cardinality), while b-tree indexes are best for high cardinality columns.
Short answer:
Long answer:
The naming
My knowledge comes from practice with SQL world, so for me the data storage used to be equal to "database" and the structure that allows to find things quick - an "index".
The trick is - search engines already call their storage "index", so how do you call that index-of-"index"? "Inverted Index", of course! Why inverted? Because, as I can already see in your question, it just inverts the the primary storage. Storage is like primary key --> values
, that helper-structure inverts it to values --> primary key
and helps quickly finding documents by values.
Purpose
Your question has a mix of Ideas. "Inverted index"
means actually more like "a data structure that helps finding documents that are already in storage" whereas B-Tree
is just an implementation of such structure.
An index could be theoretically implemented with any data structure you want. Hashes, Graphs, Trees, Arrays, Bitmaps.. it just depends on your usecase.
The differences
B-Tree
is good for data that changes, so it's used e.g. in databases and filesystems. Downside: multiple indices cannot be used together in one query (I guess because this structure is dynamic and thus references back to documents are not sorted) and it's data tends to become scattered, so the IO can become an issue.
"Inverted index"
uses Bitmaps/Arrays and everything's sorted (list of values and the list of references to documents). These are good for static data sets. And because of sorted nature, multiple indices can be used together. Downside: updating is not performant (new document means inserting values somewhere in a sorted list), tricks are used like keeping batches of data together as it comes in and merging into bigger batches in a background process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With