Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra CQL range query rejected despite equality operator and secondary index

From the table schema below, I am trying to select all pH readings that are below 5.

I have followed these three pieces of advice:

  1. Use ALLOW FILTERING
  2. Include an equality comparison
  3. Create a secondary index on the reading_value column.

Here is my query:

select * from todmorden_numeric where sensor_name = 'pHradio' and reading_value < 5  allow filtering;

Which is rejected with this message:

Bad Request: No indexed columns present in by-columns clause with Equal operator

I tried adding a secondary index to the sensor_name column and was told that it was already part of the key and therefore already indexed.

I created the index after the table had been in use for a while - could that be the problem? I ran "nodetool refresh" in the hope it would make the index available but this did not work. Here is the output of describe table todmorden_numeric :

CREATE TABLE todmorden_numeric (
  sensor_name text,
  reading_time timestamp,
  reading_value float,
  PRIMARY KEY ((sensor_name), reading_time)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='Data that suits being stored as floats' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX todmorden_numeric_reading_value_idx ON todmorden_numeric (reading_value);
like image 564
naomi Avatar asked Jan 11 '23 04:01

naomi


1 Answers

Cassandra allows range search only on:

a) Partition Key only if ByteOrderPartitioner is used (default now is murmur3).

b) any single clustering key ONLY IF any clustering keys defined BEFORE the target column in the primary key definition are already specified by an = operator in the predicate.

They don't work on secondary indices.

Consider the following table definition:

CREATE TABLE tod1 (name text, time timestamp, 
    val float, PRIMARY KEY (name, time));

You CAN'T do a range on the val in this case.

Consider this one:

CREATE TABLE tod2 (name text, time timestamp, 
    val float, PRIMARY KEY (name, time, val));

Then the following is valid:

SELECT * FROM tod2 WHERE name='X' AND time='timehere' AND val < 5; 

Kinda pointless, but this is not valid:

SELECT * from tod2 WHERE name='X' AND val < 5; 

It's not valid as you haven't filtered by a previous clustering key in the primary key def (in this case, time).

For your query, you may want to do this:

CREATE TABLE tod3 (name text, time timestamp, 
    val float, PRIMARY KEY (name, val, time));

Note the order of columns in the primary key: val's before time.

This will allow you to do:

SELECT * from tod3 WHERE name='asd' AND val < 5;

On a different note, how long do you intend to hold data? How frequently do you get readings? This can cause your partition to grow quite large quite quickly. You may want to bucket it readings into multiple partitions (manual sharding). Perhaps one partition per day? Of course, such things would greatly depend on your access patterns.

Hope that helps.

like image 122
ashic Avatar answered Jan 31 '23 14:01

ashic