Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between a secondary index and an inverted index in Cassandra?

When I read about these two, I thought both of them are explaining the same approach, I googled but found nothing. Is the difference in implementation? Cassandra does the secondary index itself but inverted index has to be implemented by myself?

Which is faster in searching, by the way?

like image 302
fereshteh Avatar asked Oct 08 '13 13:10

fereshteh


People also ask

What is the difference between index and inverted index?

In the forward index, indexing is fast as keywords are appended when found. In the Inverted index, the search is quite fast. In the forward index, the search is slow. In the Inverted index, no duplicate keyword is stored in an index.

What is a secondary index in Cassandra?

4. Secondary Indexes. Secondary Indexes in Cassandra solve the need for querying columns that are not part of the primary key. When we insert data, Cassandra uses an append-only file called commitlog for storing the changes, so writes are quick.

What is a secondary index?

A secondary index is a data structure that contains a subset of attributes from a table, along with an alternate key to support Query operations. You can retrieve data from the index using a Query , in much the same way as you use Query with a table.

What is the difference between a primary index and a secondary index?

Difference Between Primary Index and Secondary IndexA primary index is an index on a set of fields that includes the unique primary key and is guaranteed not to contain duplicates. In contrast, a secondary index is an index that is not a primary index and may have duplicates.


1 Answers

The main difference is that secondary indexes in Cassandra are not distributed in the same way a manual inverted index would be. With the inbuilt secondary indexes, each node indexes the data it stores locally (using the LocalPartitioner). With manual indexing, the indexes are distributed independently of the nodes that store the values.

This means that, for the inbuilt indexes, each query must go to each node, whereas if you did inverted indexing manually you would just go to one node (plus replicas) to query the value you were looking up. One advantage of having the index stored locally is that indexes can be updated atomically with the data. (Although, since Cassandra 1.2, the atomic batches could be used for this instead although they are a bit slower.)

This is why Cassandra indexes are not recommended for really high cardinality data. If you are doing a lookup on each node but there are only one or two results, it is inefficient and a manual inverted index will be better. If your lookup returns many results, then you will need to lookup on each node anyway so the inbuilt indexes work well.

A further advantage of using Cassandra's inbuilt indexing is that the indexes are updated lazily, so you don't need to do a read on every update. (See CASSANDRA-2897.) This can be a significant speed improvement for indexed tables with high write throughput.

like image 122
Richard Avatar answered Sep 19 '22 23:09

Richard