Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Cassandra pagination behaves on concurrent inserts

I do pagination of large result sets with Cassanda 2.2 using the java client and PagingState like described here: https://datastax.github.io/java-driver/2.2.0-rc2/features/paging/

That works pretty well but i can not find any informations how Cassandra behaves when, while paging through the results, new records are inserted (or existing ones are updated). Are such new or changed records included in the result or is the result set immutable?

The use case is a stateless web service where a client can query large result sets.

EDIT: Same question for ResultSet paging in general (Cassandra does automatic lazy fetch here)

EDIT2: To my knowledge Cassandra supports no ACID but AID transactions, so i would expect a kind of isolation here when going through the resultset

like image 339
salyh Avatar asked Oct 05 '15 17:10

salyh


1 Answers

There is no such isolation, as it would be too expensive to implement. The whole result set is not kept in memory, and the rows to be returned in the next page are not known when the current one is shipped to the client.

One interesting consequence of this is that it breaks the BATCH update guarantee, stated in the documentation as:

All updates in a @BATCH@ belonging to a given partition key are performed in isolation.

There's one open issue about this.

There are also some performance implications, because a lot of the work done to fetch page n has to be done again to fetch page n + 1 (such as opening and reading from index files and data files). Scylla, a drop-in replacement for Cassandra to which I contribute, is working on fixing this.

like image 117
Duarte Nunes Avatar answered Oct 12 '22 23:10

Duarte Nunes