Here is mine understanding about both B Tree index :- It is generally used database column. It keeps the column content as key and row_id as value . It keeps the key in sorted fashion to quickly find the key and row location Inverted Index :- Generally used in full text search. Here also word in document works as key, stored in sorted fashion along with doucument location/id as value. So what's the difference b/w B tree index and Inverted index . To me they looks same

Short answer: <ul> <li>yes, they have the same purpose - finding things fast </li> <li>difference: what are they useful for / particularly good at</li> <li>and btw the naming is just awfully confusing too</li> </ul> Long answer: The naming My knowledge comes from practice with SQL world, so for me the data storage used to be equal to "database" and the structure that allows to find things quick - an "index". The trick is - search engines already call their storage "index", so how do you call that index-of-"index"? "Inverted Index", of course! Why inverted? Because, as I can already see in your question, it just inverts the the primary storage. Storage is like <code>primary key --> values</code>, that helper-structure inverts it to <code>values --> primary key</code> and helps quickly finding documents by values. Purpose Your question has a mix of Ideas. <code>"Inverted index"</code> means actually more like "a data structure that helps finding documents that are already in storage" whereas <code>B-Tree</code> is just an implementation of such structure. An index could be theoretically implemented with any data structure you want. Hashes, Graphs, Trees, Arrays, Bitmaps.. it just depends on your usecase. The differences <code>B-Tree</code> is good for data that changes, so it's used e.g. in databases and filesystems. Downside: multiple indices cannot be used together in one query (I guess because this structure is dynamic and thus references back to documents are not sorted) and it's data tends to become scattered, so the IO can become an issue. <code>"Inverted index"</code> uses Bitmaps/Arrays and everything's sorted (list of values and the list of references to documents). These are good for static data sets. And because of sorted nature, multiple indices can be used together. Downside: updating is not performant (new document means inserting values somewhere in a sorted list), tricks are used like keeping batches of data together as it comes in and merging into bigger batches in a background process.

B Tree Index vs Inverted Index?

1 Answers

Short answer:

yes, they have the same purpose - finding things fast
difference: what are they useful for / particularly good at
and btw the naming is just awfully confusing too

Long answer:

The naming

My knowledge comes from practice with SQL world, so for me the data storage used to be equal to "database" and the structure that allows to find things quick - an "index".

The trick is - search engines already call their storage "index", so how do you call that index-of-"index"? "Inverted Index", of course! Why inverted? Because, as I can already see in your question, it just inverts the the primary storage. Storage is like primary key --> values, that helper-structure inverts it to values --> primary key and helps quickly finding documents by values.

Purpose

Your question has a mix of Ideas. "Inverted index" means actually more like "a data structure that helps finding documents that are already in storage" whereas B-Tree is just an implementation of such structure.

An index could be theoretically implemented with any data structure you want. Hashes, Graphs, Trees, Arrays, Bitmaps.. it just depends on your usecase.

The differences

B-Tree is good for data that changes, so it's used e.g. in databases and filesystems. Downside: multiple indices cannot be used together in one query (I guess because this structure is dynamic and thus references back to documents are not sorted) and it's data tends to become scattered, so the IO can become an issue.

"Inverted index" uses Bitmaps/Arrays and everything's sorted (list of values and the list of references to documents). These are good for static data sets. And because of sorted nature, multiple indices can be used together. Downside: updating is not performant (new document means inserting values somewhere in a sorted list), tricks are used like keeping batches of data together as it comes in and merging into bigger batches in a background process.

125

answered Sep 20 '22 10:09

davisca

Related questions
                            
                                How to view history of queries (all OR over a long period) performed on database which is hosted on Azure?
                            
                                What is the difference between a secondary index and an inverted index in Cassandra?
                            
                                Is it okay to have non sequential ids as primary keys for a table in your database?
                            
                                MATLAB: Splitting a matrix based on multiple values
                            
                                How do I replace values along z-axis in Numpy 3D array based on 2D index and 1D value vector
                            
                                Why does OpenGL not support multiple index buffering?
                            
                                firestore read count with where condition -indexed [duplicate]
                            
                                Pyspark add sequential and deterministic index to dataframe
                            
                                How to speed up this query?
                            
                                database index: why pairing
                            
                                Thinking Sphinx Rake aborted, searchd is running while rebuilding or start/stop ts. Index works fine
                            
                                How can an index slow down a select statement?
                            
                                How to substitute `find` commands with `logical indexing` (MATLAB), for looking up vector value positions of unique values?
                            
                                How to remove columns from each matrix in a list of matrices in R?
                            
                                Extract most recent entry, under a certain condition
                            
                                Speeding up checking of IP address membership in CIDR ranges, for large datasets
                            
                                PostgreSQL does not use a partial index
                            
                                Why can yield be indexed?
                            
                                Positional indexing in F#
                            
                                Python custom class indexing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

B Tree Index vs Inverted Index?

Tags:

indexing

binary-tree

inverted-index

emilly

People also ask

1 Answers

davisca

Recent Activity

Donate For Us