Cassandra cluster - data density (data size per node) - looking for feedback and advises

Tags:

cassandra

I am considering the design of a Cassandra cluster.

The use case would be storing large rows of tiny samples for time series data (using KairosDB), data will be almost immutable (very rare delete, no updates). That part is working very well.

However, after several years the data will be quite large (it wil reach a maximum size of several hundreds of terabytes - over one petabyte considering the replication factor).

I am aware of advice not to use more than 5TB of data per Cassandra node because of high I/O loads during compactions and repairs (which is apparently already quite high for spinning disks). Since we don't want to build an entire datacenter with hundreds of nodes for this use case, I am investigating if this would be workable to have high density servers on spinning disks (e.g. at least 10TB or 20TB per node using spinning disks in RAID10 or JBOD, servers would have good CPU and RAM so the system will be I/O bound).

The amount of read/write in Cassandra per second will be manageable by a small cluster without any stress. I can also mention that this is not a high performance transactional system but a datastore for storage, retrievals and some analysis, and data will be almost immutable - so even if a compaction or a repair/reconstruction that take several days of several servers at the same time it's probably not going to be an issue at all.

I am wondering if some people have an experience feedback for high server density using spinning disks and what configuration you are using (Cassandra version, data size per node, disk size per node, disk config: JBOD/RAID, type of hardware).

Thanks in advance for your feedback.

Best regards.

864

asked Jul 22 '15 12:07

Loic

1 Answers

The risk of super dense nodes isn't necessarily maxing IO during repair and compaction - it's the inability to reliably resolve a total node failure. In your reply to Jim Meyer, you note that RAID5 is discouraged because the probability of failure during rebuild is too high - that same potential failure is the primary argument against super dense nodes.

In the days pre-vnodes, if you had a 20T node that died, and you had to restore it, you'd have to stream 20T from the neighboring (2-4) nodes, which would max out all of those nodes, increase their likelihood of failure, and it would take (hours/days) to restore the down node. In that time, you're running with reduced redundancy, which is a likely risk if you value your data.

One of the reasons vnodes were appreciated by many people is that it distributes load across more neighbors - now, streaming operations to bootstrap your replacement node come from dozens of machines, spreading the load. However, you still have the fundamental problem: you have to get 20T of data onto the node without bootstrap failing. Streaming has long been more fragile than desired, and the odds of streaming 20T without failure on cloud networks are not fantastic (though again, it's getting better and better).

Can you run 20T nodes? Sure. But what's the point? Why not run 5 4T nodes - you get more redundancy, you can scale down the CPU/memory accordingly, and you don't have to worry about re-bootstrapping 20T all at once.

Our "dense" nodes are 4T GP2 EBS volumes with Cassandra 2.1.x (x >= 7 to avoid the OOMs in 2.1.5/6). We use a single volume, because while you suggest "cassandra now supports JBOD quite well", our experience is that relying on Cassandra's balancing algorithms is unlikely to give you quite what you think it will - IO will thundering herd between devices (overwhelm one, then overwhelm the next, and so on), they'll fill asymmetrically. That, to me, is a great argument against lots of small volumes - I'd rather just see consistent usage on a single volume.

answered Oct 06 '22 01:10

Jeff Jirsa

Related questions
                            
                                Cassandra denormalization datamodel
                            
                                What to use for session management?
                            
                                Cassandra NOT EQUAL Operator
                            
                                Datastax Java Driver does not connect if one host is missing
                            
                                Fetch all rows in cassandra
                            
                                how UPDATE rows in cassandra using only Partition Key?
                            
                                Cassandra 3.0 and later require Java 8u40 or later
                            
                                Should I call session.close() and cluster. close() after each web API call
                            
                                Cassandra.yaml configuration error- expected '<document start>', but found Scalar
                            
                                SparkSQL error Table Not Found
                            
                                Passing parameter to Cassandra CQL query using DataStax client
                            
                                What is meant by a node in cassandra?
                            
                                what exactly is a map dimension in a multi-dimensional map?
                            
                                What NoSQL DB to use for sparse Time Series like data?
                            
                                RPC timeout in cqlsh - Cassandra
                            
                                Cassandra cql: how to select the LAST n rows from a table
                            
                                Parquet vs Cassandra using Spark and DataFrames
                            
                                How should I check if resultset is empty or null using datastax cassandra driver for java
                            
                                What is the maximum number of keyspaces in Cassandra?
                            
                                How to pass TTL in Cassandra Java Driver QueryBuilder?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With