I want to model a small table using Cassandra (purely for convenience reasons) that puts all the data on a single node (and possibly replicated data on some other nodes). The reason for this is that I want to do a lot of SELECT * FROM my_table
queries from it afterwards.
I know that there is a way to achieve this by creating a table with a constant Partition Key and putting my current Primary Key into Clustering Key (Columns), but this feels very hacky. This would superficially store the whole table on r nodes, where r = replication factor.
Example:
Current table with Primary Key (param1)
. Move this to Primary Key ('some constant', param1)
.
Is there a better way of achieving this, e.g. using some Cassandra table or keyspace configuration that I missed?
I know that there is a way to achieve this by creating a table with a constant Partition Key and putting my current Primary Key into Clustering Key (Columns), but this feels very hacky. This would superficially store the whole table on r nodes, where r = replication factor.
I disagree that this is hacky or a superficial solution. A key part of the Cassandra design is to choose an appropriate partition key so that all your data for a single query is in a single partition. In your case, if you want all of your table data to be read then its perfectly reasonably to have a constant partition key.
If there's no field in your dataset that will be constant, then using an arbitrary value (a partition ID) is fine. This is basically the same as adding an arbitrary value as an additional clustering column in your schema to bucket your data, which is a very common use case.
To answer your question directly, no I don't think there are settings to achieve what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With