Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the correct Cassandra collection limits?

Tags:

cassandra

I'm testing an application where the size of the collections is bound to grow in the future and 64k is a limit that may be reached in some cases.

This question is about the collection size limit as there seems to be a contradiction in the official documentation.

As per this document:

If you insert more than 64K items into a collection, only 64K of them will be queryable, resulting in data loss.

But if you click through to the CQL Limits link on that very page you see this:

  • Collection (List): collection size: 2B (2^31); values size: 65535 (2^16-1) (Cassandra 2.1 and later, using native protocol v3)

  • Collection (Set): collection size: 2B (2^31); values size: 65535 (2^16-1) (Cassandra 2.1 and later, using native protocol v3)

  • Collection (Map): collection size: 2B (2^31); number of keys: 65535 (2^16-1); values size: 65535 (2^16-1) (Cassandra 2.1 and later, using native protocol v3)

So which one is it? 64k items per collection, or 2 billion items per collection? Or are 2 billion writeable but not readable beyond 64k?

Thanks in advance.

like image 902
Jose Fonseca Avatar asked Sep 18 '25 23:09

Jose Fonseca


1 Answers

which version of cassandra you are using ?

that documentaion is for 2.0 and 2.1 . and in that case there is a limitation of how many elements you can put in a collection. which is 64k. however each element can have a size of 2b if you are using native protocol v3. check this https://issues.apache.org/jira/browse/CASSANDRA-5428

but if you are using cassandra 2.2 and later you can insert 2billion items into collection. here is the link. http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCollections.html

having said that you should not insert that many items into the collection. you will hit performance issues way before you hit the max elements insertion limit.

Collections cannot be "sliced"; Cassandra reads a collection in its entirety, impacting performance. Thus, collections should be much smaller than the maximum limits listed. The collection is not paged internally.

If you have to have that much of item then in that case collections are not appropriate anymore and a specific table (with clustering columns) should be used.

I hope this helps.

like image 111
root Avatar answered Sep 21 '25 12:09

root