Compound Indexes in Apache Cassandra

Tags:

I am trying to set up a cassandra column family with secondary indexes on a few columns I will need to filter by when reading data back out. In my initial testing, when I use multiple indexes together, things slow down. Here is how I have it configured currently (via cassandra-cli):

Click to copy

update column family bulkdata with comparator=UTF8Type and column_metadata=[{column_name: test_field, validation_class: UTF8Type}, {column_name: create_date, validation_class: LongType, index_type: KEYS}, {column_name: domain, validation_class: UTF8Type, index_type: KEYS}];

I want to get all data where create_date > somevalue1 and column_name = somevalue2. Using pycassa for my client I do the following:

Click to copy

  domain_expr = create_index_expression('domain', 'whatever.com')
  cd_expr = create_index_expression('create_date', 1293650000, GT)
  clause = create_index_clause([domain_expr, cd_expr], count=10000)
  for key, item in col_fam.get_indexed_slices(clause):
    ...

This is a common mistake in SQL of course, where one would normally have to create a compound index, based on the query needs. I'm quite new to cassandra though, so I don't know if such a thing is required or even exists.

My interactions with cassandra will include large numbers of writes, and large numbers of reads and updates. I have set up the indexes figuring they were the right thing to do here, but perhaps I am completely wrong. I'd be interested in any ideas for setting up a performant system, with my index setup or without.

oh, and this is on cassandra 0.7.0-rc3

839

asked Dec 29 '10 22:12

Jake

1 Answers

Native Cassandra secondary indexes have some limitations. They are not supposed to be used for columns with high cardinality (too many unique values), according to datastax documentation. It seems like the create_date column you are indexing on will have high cardinality. Also, there is no such thing as compound index in native Cassandra index support.

For more in depth coverage, you can visit my blog post http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/

Pranab

161

answered Sep 22 '22 05:09

Pranab

Related questions
                            
                                Add columns dynamically in cassandra
                            
                                How to trace back a large partition of a column family in cassandra
                            
                                Multiple constructors with the same number of parameters exception while transforming data in spark using scala
                            
                                How to pushdown limit predicate for Cassandra when you use dataframes?
                            
                                Correct way of creating a realtime application with Cassandra
                            
                                Can I expect a significant performance boost by moving a large key value store from MySQL to a NoSQL DB?
                            
                                Which clustered NoSQL DB for a Message Storing purpose?
                            
                                Cassandra: Query with where clause containing greather- or lesser-than (< and >)
                            
                                Availability of Cassandra
                            
                                cassandra 1.2 fails to init snappy in freebsd
                            
                                cassandra -tokens and org.apache.cassandra.exceptions.ConfigurationException: For input string:
                            
                                What options are there to speed up a full repair in Cassandra?
                            
                                How do I store nested data in Cassandra
                            
                                CASSANDRA CQL3 : Set value to entire column
                            
                                Cassandra valid column names
                            
                                how to stream data out of a cassandra table?
                            
                                Is Cassandra for OLAP or OLTP or both?
                            
                                Codec not found for requested operation: [date <-> java.util.Date]
                            
                                Cassandra - select query with token() function
                            
                                storing binary data on cassandra just like MYSQL BLOB binary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compound Indexes in Apache Cassandra

Tags:

indexing

cassandra

Jake

People also ask

1 Answers

Pranab

Recent Activity

Donate For Us