CREATE TABLE users ( userId uuid, firstname varchar, mobileNo varchar, PRIMARY KEY (userId) );
CREATE TABLE users_by_firstname ( userId uuid, firstname varchar, mobileNo varchar, PRIMARY KEY (firstname,userId) );
I have 100 rows in these tables. I want to get randomly selected 10 rows each time.
In MySQL
select * from users order by RAND() limit 10;
In Cassandra
select * from users limit 10;
select * from users_by_firstname limit 10;
But from 1st table I would get the static 10 rows sorted by the generated hash of the partition key (userId).
From the second one I would get the static 10 rows sorted by userId. But it will not be random if the data does not change.
Is there any way to get random rows each time in Cassandra.
Thanks
Chaity
It's not possible to archive this directly. There are possibilities to emulate this (this solution is not really random, but you should receive different values), but it's not really a perfect idea.
What you could do is, create a random value in the cassandra token range -2^63 - 2^64. With this random value you can perform such a query:
select * from users_by_firstname where token(userId) > #generated_value# limit 10;
Using this method you can define a random 'starting point' from where you can receive 10 users. As I said, this method is not perfect and it certainly needs some thoughts on how to generate the random token. An edge case could be, that your random value is so far on one side of the ring, that you would receive less than 10 values.
Here is a short example:
Lets say you have a users table with the following users:
token(uuid) | name
----------------------+---------
-2540966642987085542 | Kate
-1621523823236117896 | Pauline
-1297921881139976049 | Stefan
-663977588974966463 | Anna
-155496620801056360 | Hans
958005880272148645 | Max
3561637668096805189 | Doro
5293579765126103566 | Paul
8061178154297884044 | Frank
8213365047359667313 | Peter
Lets now say you generate the value 42 as a start-token, the select would be
select token(uuid), name from test where token(uuid) > 42 limit 10;
In this example the result would be
token(id) | name
---------------------+-------
958005880272148645 | Max
3561637668096805189 | Doro
5293579765126103566 | Paul
8061178154297884044 | Frank
8213365047359667313 | Peter
This method might be a reasonable approach if you have a lot of data, and a balanced cluster. To make sure you don't run into these edge case you could limit the range to not come near the edges of the cassandra token range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With