I'm trying to find a way to do polling over a cassandra database, but I'm new at this and I don't know how.
Lets say I have a table "users" like this
-> users
-> user_name
-> gender
-> state
and I want to do polling constantly so I know if a new user was added to the table. How can I do that?
The standard approach in a relational DB would involve doing a SELECT, ordering by some time-related ID descending, so that the newest row would always be returned first, so you could see if that matched your last 'newest row' and identify change - in cassandra, that won't work, because without a WHERE clause, the results are ordered by the partition's token, which is (almost certainly) random.
The solution, then, is to create a table that has a partition, where users are sorted within a given partition. For example:
CREATE TABLE user_buckets (
bucket text,
user_timestamp timeuuid,
user_username text,
PRIMARY KEY(bucket, user_timestamp)
) WITH CLUSTERING ORDER BY (user_timestamp DESC);
In this case, you would write into both the users table and the user_buckets table, with 'bucket' being something reasonable (such as date(YYYY) - where each partition contains all of the users registering in that year, or date(YYYYMMDD) - where each partition contains all of the users registering in that day), and then use SELECT ... FROM user_buckets WHERE bucket=(current-bucket) AND user_timestamp > (last timestamp you've seen).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With