Theoretically, Cassandra allows up to 2 billion columns in a wide row.
I have heard that in reality up to 50.000 cols/50 MB are fine; 50.000-100.000 cols/100 MB are OK but require some tuning; and that one should never go above 100.000/100 MB columns per row. The reason being that this will put pressure on the heap.
Is there some truth to this?
In Cassandra, the maximum number of cells (rows x columns) in a single partition is 2 billion.
Additionally, a single column value may not be larger than 2GB, but in practice, "single digits of MB" is a more reasonable limit, since there is no streaming or random access of blob values.
Partitions greater than 100Mb can cause significant pressure on the heap.
One of our tables with cassandra 1.2 went pass 100 MB columns per row limit due to new write patterns we experienced. We have experienced significant pressure on both compactions and our caches. Btw, we had rows with several hundred MBs.
One approach is to just redesign and migrate the table to a better designed table(s) that will keep your wide rows under that limit. If that is not an option, then I suggest tune your cassandra so both compactions and caches configs can deal with your wide rows effectively.
Some interesting links to things to tune:
Cassandra Performance Tuning
in_memory_compaction_limit_in_mb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With