Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

order by clause not working in Cassandra query

Tags:

cassandra

I have created a table layer using following code:

CREATE TABLE layer (
    layer_name text,
    layer_position text,
    PRIMARY KEY (layer_name, layer_position)
) WITH CLUSTERING ORDER BY (layer_position DESC)

I use the below query to fetch data from the layer table in descending order(layer):

$select = new Cassandra\SimpleStatement(<<<EOD
                        select * from layer ORDER BY layer_position DESC
EOD
                      ); 

$result = $session->execute($select);

But this query is not working. Please can anyone help me?

like image 691
Nirali Kavar Avatar asked Dec 28 '15 09:12

Nirali Kavar


1 Answers

Simply put, Cassandra only enforces sort order within a partition key.

PRIMARY KEY (layer_name, layer_position)
) WITH CLUSTERING ORDER BY (layer_position DESC)

In this case, layer_name is your partition key. If you specify layer_name in your WHERE clause, your results for that value of layer_name will be ordered by layer_position.

SELECT * FROM layer WHERE layer_name = 'layer1';

You don't need to specify ORDER BY. All ORDER BY really can do at the query level is apply a different sort direction (ascending vs. descending).

Cassandra works this way, because it is designed to read data in whatever order it is sorted on disk. Your partition keys are sorted by hashed token value, which is why results from an unbound WHERE clause appear to be ordered randomly.

EDIT

I have to fetch data using state_id column and it should be order by layer_position.

Cassandra tables are optimized for a specific query. While this results in high performance, the drawback is that query flexibility is limited. The way to solve for this, is to duplicate your data into an additional table designed to serve that particular query.

CREATE TABLE layer_by_state_id (
    layer_name text,
    layer_position text,
    state_id text,
    PRIMARY KEY (state_id, layer_position, layer_name)
) WITH CLUSTERING ORDER BY (layer_position DESC, layer_name ASC);

This table will allow queries like this to work:

SELECT * FROM layer WHERE state_id='thx1138';

And the results will be sorted by layer_position, within the requested state_id.

Now I am making a couple of assumptions that you will want to investigate:

  • I am assuming that state_id is a good partitioning key. Meaning that it has high-enough cardinality to offer good distribution in the cluster, but low-enough cardinality that it returns enough CQL rows to make sorting worthwhile.
  • I am assuming that the combination of state_id and layer_position is not enough to uniquely identify each row. Therefore I am ensuring uniqueness by adding layer_name as an additional clustering key. You may or may not need this, but I'm guessing that you will.
  • I am assuming that using state_id as a partitioning key will not exhibit unbound growth so as to approach Cassandra's limit of 2 billion cells per partition. If that is the case, you may need to add an additional partition "bucket."
like image 118
Aaron Avatar answered Sep 22 '22 15:09

Aaron