Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what does `create index` do in cassandra tables?

consider this example:

create table bite (
      id varchar PRIMARY KEY,
      feedid varchar,
      score bigint,
      data varchar
  );

create index bite_feedid on bite (feedid);
create index bite_score on bite (score);

I am not sure what the last two lines create index.. do? why is it important? Does it create a new table? If so, how can I look up using that?

Thanks

like image 550
eagertoLearn Avatar asked Jul 25 '14 22:07

eagertoLearn


3 Answers

A secondary index creates a new table using the indexed column as primary key. Advantages of this approach is that your write/delete operations on a table will be automatically translated into multiples operations, you don't have to care about it. Now that Cassandra support logged batches it may not seem a big advantage but in Cassandra 0.7 ... 1.1 was a big stuff.

Secondary indexes should not be used when the query on the index will retrieve always one result (eg: putting secondary index on a uuid).

A good feature of s.i. is that you can both query a single column without knowing anything of the primary key and combine part of the primary key with a secondary index (using AND operator).

You can't perform WHERE clause with multiple secondary indexes combined in AND.

HTH, Carlo

like image 197
Carlo Bertuccini Avatar answered Nov 15 '22 06:11

Carlo Bertuccini


create index creates a secondary index for the table. In cassandra, data is stored in partitions across nodes - one partition corresponds to one partition key - which is the first key of the primary key. Remaining keys in the primary key constitute the clustering keys. For example, if you had the following:

CREATE TABLE foo.people ( id int, name text, age int, job text, PRIMARY KEY (id, name, job) )

id would be the partition key, and name and job would be the clustering keys.

Data in a partition is stored in order of the clustering keys. When querying with filters, you specify a partition key, and then you can filter down based on clustering keys. For multiple clustering keys, you must specify previous clustering in order to use a particular one. For example, in the mentioned scenario, you can do

where id = 2 and name = 'john' and job = 'dev' or where id = 2 and name = 'john'

but not where id = 2 and job = 'dev' as name appears before job in the clustering key.

You can't do a filter on age as it's not part of a key. This is where the secondary index comes in. If you then do: create index blah on people(age)

you will be allowed to do this: select * from people where age = 45;

This can potentially be expensive as it will query across your cluster. The following though, can be efficient: select * from people where id=2 and age = 45;

This is useful for time series or other wide row formats.

Queries on secondary indices are restrictive - you can't do range queries for example - you're limited to = checks.

Secondary indices in cassandra can save you the hassle of maintaining index tables yourself, and are more efficient than if you'd done so manually. They are eventually consistent (your writes won't wait for indices to be updated to return success) and currently, index info for a node's data is stored locally.

Lastly, you can find the indexes currently in place from the "IndexInfo" table in the system keyspace.

Hope that helps.

like image 34
ashic Avatar answered Nov 15 '22 06:11

ashic


Usually in traditional databases creating index will use a data structure for example say HashMap whose keys will be the indexed column and the value points to the actual row in the table . So that it allows the query to fetch results based on the index key in approximately O(1).

How is the index created? Each key in the indexed column is hashed using a hashing function wich will return a value and it is used as index.

In Cassandra database since the data (ie) a particular column itself is distributed it used special mechanism to achieve the above indexing.

Indexing means fast retrieval or fast read. But the caveat is too much of indexing also leads to its bad things like collisions in the indexed keys.

like image 43
rozar Avatar answered Nov 15 '22 07:11

rozar