Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Read taking longer than expected

Tags:

cassandra

I am using cassandra 1.2 with CQL3. I have three column families in my keyspace. When I query one of the column family(phones), it takes a long time to retrive. Here is my query

**select * from phones where phone_no in ('9038487582');**

Here is the tracing output for the query.

activity                                        | timestamp    | source      | source_elapsed
-------------------------------------------------+--------------+-------------+----------------
                              execute_cql3_query | 16:35:47,675 | 10.1.26.155 |              0
                               Parsing statement | 16:35:47,675 | 10.1.26.155 |             58
                              Peparing statement | 16:35:47,675 | 10.1.26.155 |            335
      Executing single-partition query on phones | 16:35:47,676 | 10.1.26.155 |           1069
                    Acquiring sstable references | 16:35:47,676 | 10.1.26.155 |           1097
                       Merging memtable contents | 16:35:47,676 | 10.1.26.155 |           1143
 Partition index lookup complete for sstable 822 | 16:35:47,676 | 10.1.26.155 |           1376
 Partition index lookup complete for sstable 533 | 16:35:47,686 | 10.1.26.155 |          10659
      Merging data from memtables and 2 sstables | 16:35:47,704 | 10.1.26.155 |          29192
              Read 1 live cells and 0 tombstoned | 16:35:47,704 | 10.1.26.155 |          29332
                                Request complete | 16:35:47,704 | 10.1.26.155 |          29601

I have only 1 replication factor on the keyspace. and have 3 node cluster. Phones have around 40 million rows and just two columns in each row. it comes back in 29ms, 15ms, 8 ms, 5 ms, 3 ms but it is not consistent. Can you guys give me any suggestions regarding what mistake I might be doing ? Also my usecase will have extremely low cache hit so caching keys is not a solution for me. Also, this is my column family definition.

CREATE TABLE phones (
  phone_no text PRIMARY KEY,
  ypids set<int>
) WITH
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
like image 868
user2390862 Avatar asked Feb 17 '23 12:02

user2390862


2 Answers

The index lookups are reasonably fast (probably the index file is being cached by the OS since it is accessed frequently); where you are losing all the time is in between that and the "merging data" step. What happens in between those is actually seeking to the data location in the sstable. (I've added a new trace entry for 1.2.6 to make that clear.)

This explains why sometimes it's fast, and sometimes not -- if your seek is uncontended, or better yet cached, then the query will be fast. Otherwise it will be slower.

I see several options that could help:

  1. Switch to Leveled compaction (http://www.datastax.com/dev/blog/when-to-use-leveled-compaction)
  2. Add more machines to get more iops by brute force
  3. Switch to SSD to get more iops by better hardware
  4. Add more RAM to make caching more effective at covering up the lack of iops

You'll note that only the first option doesn't include more or different hardware, so that's what I'd evaluate first. But the upside is limited: at best you'll reduce the number of sstables to 1.

like image 89
jbellis Avatar answered Feb 22 '23 23:02

jbellis


From the table that you provided above the majority of the query time is in index look up and merging SSTables. This is fairly common, I don't believe that you have done anything wrong.

The index lookup can be avoided by de-normalising the data. It is common practice with Cassandra to design ColumnFamilies around queries, rather than tables as is typical in relational systems. This then puts the burden at the point of writing the data, which is where Cassandra is strongest and of course risks data consistency due to data duplication and cassandra's natural tendency to give clients different views of data in order to optimise for clustered availability of the data.

The merging of sstables is Cassandra's Achilles Heel so to speak. Cassandra optimises write speed and reliability at the expense of both read time latency and latency consistency. It is perfectly normal with Cassandra to have 'slower' reads that keep varying in duration. To reduce this problem there are two approaches, the first is to avoid any kind of update or delete of data in the column family as that is what causes compactions later on. But even then that only delays sstable work as inserting will cause memtables to get flushed. So the other solution, which can be considered if the variation/duration is still too long, is to front Cassandra with a cache such as Memcache. This is the approach that Netflix documented here Netflix benchmarking of Cassandra.

For completeness I should add that the column family settings in Cassandra can be tweaked, benchmarked and then tweaked again to reduce this problem. But that will only get one so far as this problem is inherent in Cassandra's design. The settings that you should look at are the cache sizes, such as memtables and its overflow rate which is the point where a new SSTable will be created. Compression can also help, as that helps to squeeze more data into memory. Typically I expect un-indexed reads to take 2-10ms (avg 5ms) depending on the hardware and cluster activity within Amazon EC2 (which is the environment that I work in these days).

like image 32
Chris K Avatar answered Feb 22 '23 22:02

Chris K