I am trying to set up a cassandra column family with secondary indexes on a few columns I will need to filter by when reading data back out. In my initial testing, when I use multiple indexes together, things slow down. Here is how I have it configured currently (via cassandra-cli):
update column family bulkdata with comparator=UTF8Type and column_metadata=[{column_name: test_field, validation_class: UTF8Type}, {column_name: create_date, validation_class: LongType, index_type: KEYS}, {column_name: domain, validation_class: UTF8Type, index_type: KEYS}];
I want to get all data where create_date > somevalue1 and column_name = somevalue2. Using pycassa for my client I do the following:
domain_expr = create_index_expression('domain', 'whatever.com')
cd_expr = create_index_expression('create_date', 1293650000, GT)
clause = create_index_clause([domain_expr, cd_expr], count=10000)
for key, item in col_fam.get_indexed_slices(clause):
...
This is a common mistake in SQL of course, where one would normally have to create a compound index, based on the query needs. I'm quite new to cassandra though, so I don't know if such a thing is required or even exists.
My interactions with cassandra will include large numbers of writes, and large numbers of reads and updates. I have set up the indexes figuring they were the right thing to do here, but perhaps I am completely wrong. I'd be interested in any ideas for setting up a performant system, with my index setup or without.
oh, and this is on cassandra 0.7.0-rc3
An index provides a means to access data in Cassandra using attributes other than the partition key. The benefit is fast, efficient lookup of data matching a given condition. The index indexes column values in a separate, hidden table from the one that contains the values being indexed.
Unlike single field index in which indexing is done on a single field, Compound Indexes does indexing on multiple fields of the document either in ascending or descending order i.e. it will sort the data of one field, and then inside that it will sort the data of another field.
How does composite index work? The columns used in composite indices are concatenated together, and those concatenated keys are stored in sorted order using a B+ Tree. When you perform a search, concatenation of your search keys is matched against those of the composite index.
Using CQL to create a secondary index on a column after defining a table. Using CQL, you can create an index on a column after defining a table. You can also index a collection column. Secondary indexes are used to query a table using a column that is not normally queryable.
Native Cassandra secondary indexes have some limitations. They are not supposed to be used for columns with high cardinality (too many unique values), according to datastax documentation. It seems like the create_date column you are indexing on will have high cardinality. Also, there is no such thing as compound index in native Cassandra index support.
For more in depth coverage, you can visit my blog post http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
Pranab
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With