Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cassandra node limitations

I am looking for if cassandra has limitations of node hardware spec like what could be the max storage per node if there is any such limitation.

I intend to use couple of nodes with 48TB storage (2TB X 24 hard drives 7200rpm) per node with some good dual xeon processor.

I have looked up for such limitations if exists any but didn't find any material about this issue. And guys why there is so much less buzz about cassandra recently while its getting mature and its up 0.8 version while most of articles/blogs are related to 0.6v only.

like image 733
Gary Lindahl Avatar asked Aug 25 '11 12:08

Gary Lindahl


2 Answers

Cassandra distributes its data by row, so the only hard limitation is that a row must be able to fit on a single node.

So the short answer is no.

The longer answer is that you'll want to make sure that you're setting up a separate storage area for your permanent data and your commit logs.

One other thing to keep in mind is that you'll still run into seek speed issues. One of the nice things about Cassandra is that you don't need to have a single node with that much data (and in fact its probably not well advised, you're storage will outpace your processing power). If you use smaller nodes (hard drive space wise) then your storage and processing capabilities will scale together.

like image 120
dmcnelis Avatar answered Oct 23 '22 12:10

dmcnelis


There are some notes here about large data set considerations.

48 TB of data per node is probably way too much. It will be much better to have more nodes with smaller amounts of data. Periodically you need to run nodetool repair, which involves reading all the data on the machine. If you are storing many terabytes of data on a machine, this will be very painful.

I would limit each node to around 1TB of data.

like image 31
sbridges Avatar answered Oct 23 '22 12:10

sbridges