I am using cassandra 1.2 with CQL3. I have three column families in my keyspace. When I query one of the column family(phones), it takes a long time to retrive. Here is my query
**select * from phones where phone_no in ('9038487582');**
Here is the tracing output for the query.
activity | timestamp | source | source_elapsed
-------------------------------------------------+--------------+-------------+----------------
execute_cql3_query | 16:35:47,675 | 10.1.26.155 | 0
Parsing statement | 16:35:47,675 | 10.1.26.155 | 58
Peparing statement | 16:35:47,675 | 10.1.26.155 | 335
Executing single-partition query on phones | 16:35:47,676 | 10.1.26.155 | 1069
Acquiring sstable references | 16:35:47,676 | 10.1.26.155 | 1097
Merging memtable contents | 16:35:47,676 | 10.1.26.155 | 1143
Partition index lookup complete for sstable 822 | 16:35:47,676 | 10.1.26.155 | 1376
Partition index lookup complete for sstable 533 | 16:35:47,686 | 10.1.26.155 | 10659
Merging data from memtables and 2 sstables | 16:35:47,704 | 10.1.26.155 | 29192
Read 1 live cells and 0 tombstoned | 16:35:47,704 | 10.1.26.155 | 29332
Request complete | 16:35:47,704 | 10.1.26.155 | 29601
I have only 1 replication factor on the keyspace. and have 3 node cluster. Phones have around 40 million rows and just two columns in each row. it comes back in 29ms, 15ms, 8 ms, 5 ms, 3 ms but it is not consistent. Can you guys give me any suggestions regarding what mistake I might be doing ? Also my usecase will have extremely low cache hit so caching keys is not a solution for me. Also, this is my column family definition.
CREATE TABLE phones (
phone_no text PRIMARY KEY,
ypids set<int>
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
The index lookups are reasonably fast (probably the index file is being cached by the OS since it is accessed frequently); where you are losing all the time is in between that and the "merging data" step. What happens in between those is actually seeking to the data location in the sstable. (I've added a new trace entry for 1.2.6 to make that clear.)
This explains why sometimes it's fast, and sometimes not -- if your seek is uncontended, or better yet cached, then the query will be fast. Otherwise it will be slower.
I see several options that could help:
You'll note that only the first option doesn't include more or different hardware, so that's what I'd evaluate first. But the upside is limited: at best you'll reduce the number of sstables to 1.
From the table that you provided above the majority of the query time is in index look up and merging SSTables. This is fairly common, I don't believe that you have done anything wrong.
The index lookup can be avoided by de-normalising the data. It is common practice with Cassandra to design ColumnFamilies around queries, rather than tables as is typical in relational systems. This then puts the burden at the point of writing the data, which is where Cassandra is strongest and of course risks data consistency due to data duplication and cassandra's natural tendency to give clients different views of data in order to optimise for clustered availability of the data.
The merging of sstables is Cassandra's Achilles Heel so to speak. Cassandra optimises write speed and reliability at the expense of both read time latency and latency consistency. It is perfectly normal with Cassandra to have 'slower' reads that keep varying in duration. To reduce this problem there are two approaches, the first is to avoid any kind of update or delete of data in the column family as that is what causes compactions later on. But even then that only delays sstable work as inserting will cause memtables to get flushed. So the other solution, which can be considered if the variation/duration is still too long, is to front Cassandra with a cache such as Memcache. This is the approach that Netflix documented here Netflix benchmarking of Cassandra.
For completeness I should add that the column family settings in Cassandra can be tweaked, benchmarked and then tweaked again to reduce this problem. But that will only get one so far as this problem is inherent in Cassandra's design. The settings that you should look at are the cache sizes, such as memtables and its overflow rate which is the point where a new SSTable will be created. Compression can also help, as that helps to squeeze more data into memory. Typically I expect un-indexed reads to take 2-10ms (avg 5ms) depending on the hardware and cluster activity within Amazon EC2 (which is the environment that I work in these days).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With