Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When are rows overwritten in cassandra

My understanding that rows are overwritten when another row with identical primary keys is inserted.

For example:

I have columns (user_id int, item_id int, site_id int), and my PRIMARY KEY(user_id, item_id)

If I had the following table:

user_id, item_id, site_id
   2       3        4

and I insert user_id : 2, item_id : 3, site_id : 10, my new table would be:

user_id, item_id, site_id
   2       3        10

not

user_id, item_id, site_id
   2       3        4
   2       3        10

Is this simple case hold in all cases? Are any subtleties that I likely not aware off? Also, I could not find this in the docs and came to this conclusion by playing around with cassandra, can anyone provide a doc source?

like image 317
Akavall Avatar asked Feb 05 '15 17:02

Akavall


1 Answers

Yes, this is how Cassandra is designed to operate. In all cases where an UPDATE or INSERT is executed, data will be updated (based on the keys) if it exists, and inserted it it does not. An important point to remember, is that under the hood, UPDATE and INSERT are synonymous. If you think about those two as being the same, then you can start to understand why it works the way that it does.

That being said, you are correct, in that you do have to look closely to find an explicit reference to this behavior in the documentation. I found the closest references in the docs and listed them below:

From the UPDATE documentation:

The row is created if none existed before, and updated otherwise. Specify the row to update in the WHERE clause by including all columns composing the partition key. ... The UPDATE SET operation is not valid on a primary key field.

From the INSERT documentation:

You do not have to define all columns, except those that make up the key. ... If the column exists, it is updated. The row is created if none exists.

Now while these excerpts may not come right out and say "be careful not to overwrite", I did manage to find an article on Planet Cassandra that was more explicit: How to Do an Upsert in Cassandra

Cassandra is a distributed database that avoids reading before a write, so an INSERT or UPDATE sets the column values you specify regardless of whether the row already exists. This means inserts can update existing rows, and updates can create new rows. It also means it’s easy to accidentally overwrite existing data, so keep that in mind.

like image 182
Aaron Avatar answered Oct 16 '22 17:10

Aaron