I have a simple table distributed by userId
:
create table test (
userId uuid,
placeId uuid,
visitTime timestamp,
primary key(userId, placeId, visitTime)
) with clustering order by (placeId asc, visitTime desc);
Each pair (userId, placeId)
can have either 1 or none visits. visitTime
is just some data associated with it, used for sorting in queries like select * from test where userId = ? order by visitTime desc
.
How can I require (userId, placeId)
to be unique? I need to make sure that
insert into test (userId, placeId, timeVisit) values (?, ?, ?)
won't insert 2nd visit to (userId, placeId)
with different time. Checking for existence before inserting isn't atomic, is there a better way?
Let me understand -- if the couple (userId, placeId)
should be unique, (meaning that you don't have to put two rows with this pair of data) what is the timeVisit
useful for in the primary key? Why would you perform a query using order by visitTime desc
if this will have only one row?
If what you need is to prevent duplication you have 2 ways.
1 - Lightweight transaction -- this, using IF NOT EXISTS
will do what you want. But as I explained here lightweight transactions are really slow due to a particular handling by cassandra
2 - USING TIMESTAMP
Writetime enforcement - (be careful with it!***) The 'trick' is to force a decreasing TIMESTAMP
Let me give an example:
INSERT INTO users (uid, placeid , visittime , otherstuffs ) VALUES ( 1, 2, 1000, 'PLEASE DO NOT OVERWRITE ME') using TIMESTAMP 100;
This produces this output
select * from users;
uid | placeid | otherstuffs | visittime
-----+---------+----------------------------+-----------
1 | 2 | PLEASE DO NOT OVERWRITE ME | 1000
Let's now decrease the timestamp
INSERT INTO users (uid, placeid , visittime , otherstuffs ) VALUES ( 1, 2, 2000, 'I WANT OVERWRITE YOU') using TIMESTAMP 90;
Now data in the table have not been updated, since there is a higher TS operation (100) for the couple (uid, placeid)
-- in fact here the output has not changed
select * from users;
uid | placeid | otherstuffs | visittime
-----+---------+----------------------------+-----------
1 | 2 | PLEASE DO NOT OVERWRITE ME | 1000
If performance matters then use solution 2, if performance doesn't matter then use solution 1. For solution 2 you could calculate a decreasing timestamp for each write using a fixed number minus the system time millis
eg:
Long decreasingTimestamp = 2_000_000_000_000L - System.currentTimeMillis();
*** this solution might lead to unexpected behaviour if, for instance, you want delete and then reinsert data. It is important to know that once you delete data you will be able to write them again only if the write operation will have a higher timestamp of the deletion one (if not specified, the timestamp used is the one of the machine)
HTH,
Carlo
With Cassandra each primary key (row key + clustering key) combination is unique. So if you have an entry with the primary key (A, B, C) and you insert another one, new, with the same (A, B, C) values, the old one will be overwritten.
In your case, you have a timeVisit attribute in the primary key, which makes this unusable in your case. You might want to rethink your scheme so you leave the timeVisit attribute out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With